[Numpy-discussion] std(axis=1) memory footprint issues + moving avg / stddev

Charles R Harris charlesr.harris at gmail.com
Sat Aug 26 12:49:33 CDT 2006


On 8/26/06, Torgil Svensson <torgil.svensson at gmail.com> wrote:
>
> Hi
>
> ndarray.std(axis=1) seems to have memory issues on large 2D-arrays. I
> first thought I had a performance issue but discovered that std() used
> lots of memory and therefore caused lots of swapping.
>
> I want to get an array where element i is the stadard deviation of row
> i in the 2D array. Using valgrind on the std() function...
>
> $ valgrind --tool=massif python -c "from numpy import *;
> a=reshape(arange(100000*100),(100000,100)).std(axis=1)"
>
> ... showed me a peak of 200Mb memory while iterating line by line...
>
> $ valgrind --tool=massif python -c "from numpy import *;
> a=array([x.std() for x in reshape(arange(100000*100),(100000,100))])"
>
> ... got a peak of 40Mb memory.
>
> This seems unnecessary since we know before calculations what the
> output shape will be and should therefore be able to preallocate
> memory.
>
>
> My original problem was to get an moving average and a moving standard
> deviation (120k rows and N=1000). For average I guess convolve should
> perform good, but is there anything smart for std()? For now I use ...


Why not use convolve for the std also? You can't subtract the average first,
but you could convolve the square of the vector and then use some variant of
std = sqrt((convsqrs - n*avg**2)/(n-1)). There are possible precision
problems but they may not matter for your application, especially if the
moving window isn't really big.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060826/09d85ce8/attachment-0001.html 


More information about the Numpy-discussion mailing list