Passing numpy arrays to matlab
david at ar.media.kyoto-u.ac.jp
Thu Nov 9 00:32:04 CST 2006
Josh Marshall wrote:
> I don't see how you are going to get around doing the copies. Matlab
> is in a separate process from the Python interpreter, and there is no
> shared memory. In what way do you want these proxy classes to "look
> like numpy arrays"?
I am not talking about the copy in the matlab <-> python interaction.
This is done through pipe, handled by the OS; I don't know the details,
but I know that communication through pipe is quite fast under linux
(see below), and is not the bottleneck.
> Note that mlabwrap creates proxy arrays, and only copies the data if
> you actually request it to. (AFAIRemember) Otherwise you aren't losing
> any speed, because there aren't going to be any copies.
There may be no copy for returned data you don't need, but that's not
the case I am talking about. For all other cases, I don't think this is
what's happening: if you take a look at mlabwrap, in the C mlabraw
module, the function mlabraw_put always calls numeric2mx for arrays,
which itself always calls makeMxFromNumeric, which makes a copy. Same in
the other direction once you call mlabwrap_get. I am doing the same in
my module, because that's the simplest thing to do.
The problem is that when you are using the function engPutVariable of
the matlab engine API, you need to give a pointer to a mxArray
structure, which is the C representation of a matlab array. You cannot
say (this is one of the brain damaged thing of matlab C api I was
talking in an other mail): build a mxArray from existing data: this is
the copy I am talking about, and this is one expensive. In the best case
(real numpy arrays with fortran storage), you can do a memcpy, but in
most cases, you need to do something which takes strides into account
(because complex matlab arrays are actually not fortran, or because by
default, most numpy arrays are C storage, and this makes a difference
for rank >= 2), which implies non-contiguous memory access, which is
*really* expensive (around 2 cycles/byte at best, on my bi Xeon 3.2 Ghz).
Basically, if you want to do something like calling the resample
function of matlab on an numpy array and using the result later in
numpy, here is what's happening right now:
1 copy numpy (or numarray in the case of mlabwrap, but this should
not matter, I guess) data into an mxArray
2 send the mxArray to matlab engine: done with pipe (imply copy ? At
least, it is contiguous array copy)
3 compute the thing into matlab
4 send the result to python mxArray
5 copy the data of the mxArray to numpy array
A quick profiling show that if you don't do any processing in matlab,
just sending and getting an array back, 1 and 5 takes roughly 80-90 % of
the time in my implementation (which is faster than mlabwrap, but I
think this is just caused by the much fancier API of mlabwrap, ie the
core mecanism to pass arrays should be roughly the same, as mlabwrap
uses the C function makeMxFromNumeric, and I am using a similar function
myself through ctypes), the 10-20% are used for the communication
through the pipe. I believe that most typical usage cases involve 1 and 5.
5 should be avoidable in many cases if I know how to build a proxy class
around the mxArray so that the the proxy behaves as a numpy array, with
the buffer owned by the mxArray; but I don't know how to do that
(particularly, how to handle the destruction of data, as the proxy
should destroy the mxArray once the proxy object is garbage collected).
1 would be easy if the C matlab API was sane, which is not the case;
they give functions which are impossible to use correctly (mxSetPr and
> What could be possible to do is add an array interface to the mlabwrap
> proxy classes so they can be used as numpy arrays when required for
> passing to numpy functions (or PIL, etc). Thus we only copy when we
> want to use numpy functions. Then we could define the operators on the
> proxy class to perform their operations on the other side of the bridge.
Yes, that's what I want to do, and in theory, this should be possible
without copy; my initial question in the beginning of the thread is how
to build a numpy proxy class from existing buffer of data, with the
proxy becoming the owner of the data (ie should do all the deallocation,
including here cleaning mxArray structures).
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
More information about the Numpy-discussion