[Numpy-discussion] Interesting Psyco/Numeric/Numarray comparison
Perry Greenfield
perry at stsci.edu
Wed Feb 5 07:06:08 CST 2003
Tim Hochberg writes:
> I was inspired by Armin's latest Psyco version to try and see how well
> one could do with NumPy/NumArray implemented in Psycotic Python. I wrote
> a bare bones, pure Python, Numeric array class loosely based on Jnumeric
> (which in turn was loosely based on Numeric). The buffer is just
> Python's array.array. At the moment, all that one can do to the arrays
> is add and index them and the code is still a bit of a mess. I plan to
> clean things up over the next week in my copius free time <0.999 wink>
> and at that point it should be easy add the remaining operations.
>
> I benchmarked this code, which I'm calling Psymeric for the moment,
> against NumPy and Numarray to see how it did. I used a variety of array
> sizes, but mostly relatively large arrays of shape (500,100) and of type
> Float64 and Int32 (mixed and with consistent types) as well as scalar
> values. Looking at the benchmark data one comes to three main conclusions:
> * For small arrays NumPy always wins. Both Numarray and Psymeric have
> much larger overhead.
> * For large, contiguouse arrays, Numarray is about twice as fast as
> either of the other two.
> * For large, noncontiguous arrays, Psymeric and NumPy are ~20% faster
> than Numarray
> The impressive thing is that Psymeric is generally slightly faster than
> NumPy when adding two arrays. It's slightly slower (~10%) when adding an
> array and a scalar although I suspect that could be fixed by some
> special casing a la Numarray. Adding two (500,100) arrays of type
> Float64 together results in the following timings:
> psymeric numpy numarray
> contiguous 0.0034 s 0.0038 s 0.0019 s
> stride-2 0.0020 s 0.0023 s 0.0033 s
>
> I'm not sure if this is important, but it is an impressive demonstration
> of Psyco! More later when I get the code a bit more cleaned up.
>
> -tim
> 0.002355
>
> 0.002355
>
The "psymeric" results are indeed interesting. However, I'd like to
make some remarks about numarray benchmarks. At this stage, most of
the focus has been on large, contiguous array performance (and as
can be seen that is where numarray does best). There are a number
of other improvements that can and will be made to numarray performance
so some of the other benchmarks are bound to improve (how much is
uncertain). For example, the current behavior with strided arrays
results in looping over subblocks of the array, and that looping is
done on relatively small blocks in Python. We haven't done any tuning
yet to see what the optimum size of block should be (it may be machine
dependent as well), and it is likely that the loop will eventually be
moved into C. Small array performance should improve quite a bit, we
are looking into how to do that now and should have a better idea
soon of whether we can beat Numeric's performance or not.
But "psymeric" approach raises an obvious question (implied I guess, but
not explicitly stated). With Psyco, is there a need for Numeric or
numarray at all? I haven't thought this through in great detail, but at
least one issue seems tough to address in this approach, and that is
handling numeric types not supported by Python (e.g., Int8, Int16 UInt16,
Float32, etc.). Are you offering the possiblity of the "pysmeric"
approach as being the right way to go, and if so, how would you handle
this issue?
On the other hand, there are lots of algorithms that cannot be handled
well with array manipulations. It would seem that psyco would be a natural
alternative in such cases (as long as one is content to use Float64 or
Int32), but it isn't obivious that these require arrays as anything but
data structures (e.g. places to obtain and store scalars).
Perry Greenfield
More information about the Numpy-discussion
mailing list