[Numpy-discussion] Numpy performance vs Matlab.
Christopher Barker
Chris.Barker@noaa....
Wed Jan 7 11:51:58 CST 2009
> Nicolas ROUX wrote:
>> The big trouble I have is a large team of people within my company is ready to replace Matlab by Numpy/Scipy/Matplotlib,
we like that!
>> This is a testcase that people would like to see working without any code restructuring.
>> The reasons are:
>> - this way of writing is fairly natural.
Only if you haven't wrapped your brain around array-oriented
programming! (see below)
>> - the original code which showed me the matlab/Numpy performance differences is much more complex,
>> and can't benefit from broadcasting or other numpy tips (I can later give this code)
so you're asking: "how can I make this code faster without changing it?"
The only way to do that is to change python or numpy, and while it might
be nice to do that to improve performance in this type of case, it's a
tall order!
It's really not a good goal, anyway -- python/numpy is by no means a
drop-in replacement for MATLAB -- they are very different beasts.
Personally, I think most of the differences favor Python, but if you try
to write python the same way you'd write MATLAB, you'll lose most of the
benefits -- you might as well stick with MATLAB.
However, in this case, MATLAB was traditionally slow with loops and
indexing and needed to be vectorized for decent performance as well.
It look like they now have a nice JIT compiler for this sort of thing --
to get a similar effect in numpy, you'll need to use weave or Cython or
something, notable not as easy as having the interpreter just do it for you.
I'd love to see a numpy-aware psyco some day, an maybe the new buffer
interface will facilitate that, but it's inherently harder with numpy --
MATLAB at least used to be limited to 2-d arrays of doubles, so far less
special casing to be done.
Even with this nifty JIT, I think Python has many advantages -- if your
code is well written, there will be a only a few places with these sorts
of performance bottlenecks, and weave or Cython, or SWIG, or Ctypes, or
f2py can all give you a good solution.
One other thought -- could numexp help here?
About array-oriented programming:
All lot of folks seem to think that the only reason to "vectorize" code
in MATLAB, numpy, etc, is for better performance. If MATLAB now has a
good JIT, then there is no point -- I think that's a mistake. If you
write your code to work with arrays of data, you get more compact, less
bug-prone code than if you are working with indexed elements all the
time. I also think the code is clearer most of the time. I say most,
because sometimes you do need to do "tricks" to vectorize that can
obfuscate the code.
I understand that this may be a simplified example, and the real
use-case could be quite different. However:
>> a = numpy.zeros((dim,dim,3))
so we essentially have three square arrays stacked together -- what do
they represent? that might help guide you, but without that, I can still
see:
>> for i in range(dim):
>> for j in range(dim):
this really means -- for every element of the 2-d arrays, which can be
written as: a[:,:]
>> a[i,j,0] = a[i,j,1]
>> a[i,j,2] = a[i,j,0]
>> a[i,j,1] = a[i,j,2]
and this is simply swapping the three around. So, if you start out
thinking in terms of a set of 2-d arrays, rather than a huge pile of
elements, the code you will arrive at is more like:
a[:,:,0] = a[:,:,1]
a[:,:,2] = a[:,:,0]
a[:,:,1] = a[:,:,2]
With no loops:
or you could give them names:
a0 = a[:,:,0]
a1 = a[:,:,1]
a2 = a[:,:,2]
then:
a0[:] = a1
a2[:] = a0
a1[:] = a2
which, of course, is really:
a[:,:,:] = a1.reshape((dim,dim,1))
but I suspect that that's the result of a typo.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
More information about the Numpy-discussion
mailing list