[Numpy-discussion] Numpy performance vs Matlab.

Christopher Barker Chris.Barker@noaa....
Wed Jan 7 11:51:58 CST 2009

> Nicolas ROUX wrote:
>> The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib,

we like that!

>> This is a testcase that people would like to see working without any code restructuring.
>> The reasons are:
>> - this way of writing is fairly natural.

Only if you haven't wrapped your brain around array-oriented 
programming! (see below)

>> - the original code which showed me the matlab/Numpy performance differences is much more complex,
>> and can't benefit from broadcasting or other numpy tips (I can later give this code)

so you're asking: "how can I make this code faster without changing it?" 
The only way to do that is to change python or numpy, and while it might 
be nice to do that to improve performance in this type of case, it's a 
tall order!

It's really not a good goal, anyway -- python/numpy is by no means a 
drop-in replacement for MATLAB -- they are very different beasts. 
Personally, I think most of the differences favor Python, but if you try 
to write python the same way you'd write MATLAB, you'll lose most of the 
benefits -- you might as well stick with MATLAB.

However, in this case, MATLAB was traditionally slow with loops and 
indexing and needed to be vectorized for decent performance as well.

It look like they now have a nice JIT compiler for this sort of thing -- 
to get a similar effect in numpy, you'll need to use weave or Cython or 
something, notable not as easy as having the interpreter just do it for you.

I'd love to see a numpy-aware psyco some day, an maybe the new buffer 
interface will facilitate that, but it's inherently harder with numpy -- 
MATLAB at least used to be limited to 2-d arrays of doubles, so far less 
special casing to be done.

Even with this nifty JIT, I think Python has many advantages -- if your 
code is well written, there will be a only a few places with these sorts 
of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or 
f2py can all give you a good solution.

One other thought -- could numexp help here?

About array-oriented programming:

All lot of folks seem to think that the only reason to "vectorize" code 
in MATLAB, numpy, etc, is for better performance. If MATLAB now has a 
good JIT, then there is no point -- I think that's a mistake. If you 
write your code to work with arrays of data, you get more compact, less 
bug-prone code than if you are working with indexed elements all the 
time. I also think the code is clearer most of the time. I say most, 
because sometimes you do need to do "tricks" to vectorize that can 
obfuscate the code.

I understand that this may be a simplified example, and the real 
use-case could be quite different. However:

>> a = numpy.zeros((dim,dim,3))

so we essentially have three square arrays stacked together -- what do 
they represent? that might help guide you, but without that, I can still 

>> for i in range(dim):
>>     for j in range(dim):

this really means -- for every element of the 2-d arrays, which can be 
written as: a[:,:]

>>         a[i,j,0] = a[i,j,1]
>>         a[i,j,2] = a[i,j,0]
>>         a[i,j,1] = a[i,j,2]

and this is simply swapping the three around. So, if you start out 
thinking in terms of a set of 2-d arrays, rather than a huge pile of 
elements, the code you will arrive at is more like:

a[:,:,0] = a[:,:,1]
a[:,:,2] = a[:,:,0]
a[:,:,1] = a[:,:,2]

With no loops:

or you could give them names:

a0 = a[:,:,0]
a1 = a[:,:,1]
a2 = a[:,:,2]


a0[:] = a1
a2[:] = a0
a1[:] = a2

which, of course, is really:

a[:,:,:] = a1.reshape((dim,dim,1))

but I suspect that that's the result of a typo.


Christopher Barker, Ph.D.

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception


More information about the Numpy-discussion mailing list