[SciPy-user] some benchmark data for numarray, Numeric and scipy-newcore

Travis Oliphant oliphant.travis at ieee.org
Sun Dec 4 16:57:05 CST 2005

Gerard Vermeulen wrote:

>I took a look at the difference between arange in numarray and scipy:
>in numarray arange is a Python function which dispatches the real work
>to a type dependent C function, whereas in scipy arange does
>all calculations in C doubles, which are cast to the requested type.
>This may explain why numarray's arange is 5 times faster than scipy's
>arange on my system (don't ask me why David's results for numarray are
>so slow).
I looked into that last night and saw that one.  We could very easily 
add a "fillarray" function to each data type if that optimization is 
seen as useful.
I think something should definitely be done so that a cast is not done 
everytime.  The Arange function could be made much faster, for sure.

The other issue of vector-vector and vector-scalar operations, I'm less 
convinced about.   Do we really need a whole other class of functions in 
the ufunc machinery.   If so, I'm inclined to included them in the math 
operations for array-scalars, rather than the ufunc machinery.

The major slow-down that does have me wondering whether an algorithm 
change (or optimization) is necessary is lines 4 and 7.  These are 
mixed-type operations which I think are exercising the BUFFER_LOOP 
section of the general ufunc code.   As the array sizes are larger than 
the buffer size (default is 80000 bytes and could be changed), no copy 
is made.   In Numeric, a copy-cast is done on the entire array which is 
the main reason, I think, for its slower performance.   In scipy core, 
currently, the cast is only done on a filled buffer.  Right now, there 
are two things happening which could be optimized: 

1) even if an array is not misbehaved it is still copied over into a 
buffer so that the inner loops are performed on the buffers.   
Technically, this is not necessary, but otherwise we would have to 
figure out a different way to signal that the inner loop should be 
called (right now its when the buffers are filled).  Otherwise it would 
have to be some combination of filled buffer or the more complicated 
notion of (single-striding no longer possible for this array).

2) Items are copied over to the buffer 1 at a time.  We should take 
advantage of contiguous chunks where we can.

In short, numarray is doing a better job of handling the memory for the 
misbehaved cases and we could learn something from that.


More information about the SciPy-user mailing list