[Numpy-discussion] np.mean and np.std performances

Davide Lasagna lasagnadavide@gmail....
Sun Apr 18 07:16:00 CDT 2010

Hi all,

I noticed some performance problems with np.mean and np.std functions.
Here is the console output in ipython:

# make some test data
>>>: a = np.arange(80*64, dtype=np.float64).reshape(80, 64)
>>>: c = np.tile( a, [10000, 1, 1])

>>>: timeit np.mean(c, axis=0)
1 loops, best of 3: 2.09 s per loop

But using reduce is much faster:

def mean_reduce(c):
    return reduce(lambda som, array: som+array, c) / c.shape[0]

>>>:timeit mean_reduce(c)
1 loops, best of 3: 355 ms per loop

The same applies to np.std():

# slighlty smaller c matrix (too much memory is used)
>>>: c = np.tile( a, [7000, 1, 1])

>>>: timeit np.std(c, axis=0)
1 loops, best of 3: 3.73 s per loop

With the reduce version:

def std_reduce(c):
    c -= mean_reduce(c)
    return np.sqrt( reduce(lambda som, array: som + array**2, c ) /
c.shape[0] )

>>>: timeit std_reduce(c)
1 loops, best of 3: 1.18 s per loop

For the std function also look at the memory usage during the execution of
the function.

The functions i gave here can be easily modified to accept an axis option
and other stuff needed.

Is there any drawback of using them? Why np.mean and np.std are so slow?

I'm sure I'm missing something.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100418/4bf75a15/attachment.html 

More information about the NumPy-Discussion mailing list