[Numpy-discussion] Optimizing mean(axis=0) on a 3D array

Tim Hochberg tim.hochberg at ieee.org
Sun Aug 27 10:36:56 CDT 2006

Martin Spacek wrote:
> Tim Hochberg wrote:
>> Here's an approach (mean_accumulate) that avoids making any copies of 
>> the data. It runs almost 4x as fast as your approach (called baseline 
>> here) on my box. Perhaps this will be useful:
> --snip--
>> def mean_accumulate(data, indices):
>>     result = np.zeros([32, 32], float)
>>     for i in indices:
>>         result += data[i]
>>     result /= len(indices)
>>     return result
> Great! I got a roughly 9x speed improvement using take() in combination 
> with this approach. Thanks Tim!
> Here's what my code looks like now:
>  >>> def mean_accum(data):
>  >>>     result = np.zeros(data[0].shape, np.float64)
>  >>>     for dataslice in data:
>  >>>         result += dataslice
>  >>>     result /= len(data)
>  >>>     return result
>  >>>
>  >>> # frameis are int64
>  >>> frames = data.take(frameis.astype(np.int32), axis=0)
>  >>> meanframe = mean_accum(frames)
> I'm surprised that using a python for loop is faster than the built-in 
> mean method. I suppose mean() can't perform the same in-place operations 
> because in certain cases doing so would fail?
I'm not sure why mean is slow here, although possibly it's a locality 
issue -- mean likely computes along axis zero each time, which means 
it's killing the cache -- and on the other hand the accumulate version 
is cache friendly.  One thing to keep in mind about python for loops is 
that they are slow if you are doing a simple computation inside (a 
single add for instance). IIRC, they are 10's of times slower. However, 
here one is doing 1000 odd operations in the inner loop, so the loop 
overhead is minimal.

(What would be perfect here is something just like take, but that 
returned an iterator instead of a new array as that could be done with 
no copying -- unfortunately such a beast does not exist as far as I know)

I'm actually surprised that the take version is faster than my original 
version since it makes a big  ol' copy. I guess this is an indication 
that indexing is more expensive than I realize. That's why nothing beats 

An experiment to reshape your data so that it's friendly to mean 
(assuming it really does operate on axis zero) and try that. However, 
this turns out to be a huge pesimization, mostly because take + 
transpose is pretty slow.


> Martin
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list