[Psyco-devel] RE: [Numpy-discussion] Interesting Psyco/Numeric/Numarray comparison
tim.hochberg at ieee.org
Wed Feb 5 08:54:05 CST 2003
Perry Greenfield wrote:
>The "psymeric" results are indeed interesting. However, I'd like to
>make some remarks about numarray benchmarks. At this stage, most of
>the focus has been on large, contiguous array performance (and as
>can be seen that is where numarray does best). There are a number
>of other improvements that can and will be made to numarray performance
>so some of the other benchmarks are bound to improve (how much is
>uncertain). For example, the current behavior with strided arrays
>results in looping over subblocks of the array, and that looping is
>done on relatively small blocks in Python. We haven't done any tuning
>yet to see what the optimum size of block should be (it may be machine
>dependent as well), and it is likely that the loop will eventually be
>moved into C. Small array performance should improve quite a bit, we
>are looking into how to do that now and should have a better idea
>soon of whether we can beat Numeric's performance or not.
I fully expect numarray to beat Numeric for large arrays eventually just
based on the fact the psymeric tends to be slightly faster Numeric now
for many cases. However, for small arrays it seems that you're likely to
be fighting the function call overhead of Python unless you go
completely, or nearly completely, to C. But that would be a shame as it
would make modifying/extending numarray that much harder.
>But "psymeric" approach raises an obvious question (implied I guess, but
>not explicitly stated). With Psyco, is there a need for Numeric or
>numarray at all? I haven't thought this through in great detail, but at
>least one issue seems tough to address in this approach, and that is
>handling numeric types not supported by Python (e.g., Int8, Int16 UInt16,
>Float32, etc.). Are you offering the possiblity of the "pysmeric"
>approach as being the right way to go,
I think there are too many open questions at this point to be a serious
contender. It's interesting enough and the payoff would be big enough
that I think it's worth throwing out some of the questions and see if
anything interesting pops out.
> and if so, how would you handle
The types issue may not be a problem. Python's array.array supports a
full set of types
psyco does not currently support fast operations on types 'f', 'I' and
'L'. I don't know if this is a technical problem, or something that's
likely to be resolved in time. The 'f' (Float32) case is critical, the
others less so.
Armin, if you're reading this perhaps you'd like to comment?
>On the other hand, there are lots of algorithms that cannot be handled
>well with array manipulations.
This is where the Psyco approach would shine. One occasionally runs into
cases where some part of the computation just cannot be done naturaly
with array operations. A common case is the equivalent of this bit of C
code: "A[i] = (C[i]<TOL) ? B[i]+5 : C[i]-5". This can be done using
take, but it requires a bunch of extra memory (3 arrays worth) and
calculations. In principle at least this could be done using psymeric in
a more natural way without the extra memory and calculations.
> It would seem that psyco would be a natural
>alternative in such cases (as long as one is content to use Float64 or
>Int32), but it isn't obivious that these require arrays as anything but
>data structures (e.g. places to obtain and store scalars).
That's not been my experience. When I've run into awkward cases like
this it's been in situations where nearly all of my computations could
Anyway, here are what I see as the issues with this type of approach:
* Types: I believe that this should not be a problem
* Interfacing with C/Fortran: This seems necessary for any Numeric
wannabe. It seems that it must be possible, but it may require a bit of
C-code, so it may not be possible to get completely away from C.
* Speed: It's not clear to me at this point whether psymeric would get
any faster than it currently is. It's pretty fast now, but the factor of
two difference between it and numarray for contiguous arrays (a common
case) is nothing to sneeze at.
Cross-platform: This is the reall killer. Psyco only runs on x86
machines. I don't know if or when that's likely to change. Not being
cross platform seems nix this from being a serious contender as a
Numeric replacement for the time being.
More information about the Numpy-discussion