[Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

Torgil Svensson torgil.svensson at gmail.com
Sat Aug 26 12:02:52 CDT 2006


Hi

ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
first thought I had a performance issue but discovered that std() used
lots of memory and therefore caused lots of swapping.

I want to get an array where element i is the stadard deviation of row
i in the 2D array. Using valgrind on the std() function...

$ valgrind --tool=massif python -c "from numpy import *;
a=reshape(arange(100000*100),(100000,100)).std(axis=1)"

... showed me a peak of 200Mb memory while iterating line by line...

$ valgrind --tool=massif python -c "from numpy import *;
a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])"

... got a peak of 40Mb memory.

This seems unnecessary since we know before calculations what the
output shape will be and should therefore be able to preallocate
memory.


My original problem was to get an moving average and a moving standard
deviation (120k rows and N=1000). For average I guess convolve should
perform good, but is there anything smart for std()? For now I use ...

>>> moving_std=array([a[i:i+n].std() for i in range(len(a)-n)])

which seems to perform quite well.

BR,

//Torgil




More information about the Numpy-discussion mailing list