[Numpy-discussion] Memory hungry reduce ops in Numpy

Andreas Müller amueller@ais.uni-bonn...
Tue Nov 15 11:07:54 CST 2011

On 11/15/2011 06:02 PM, Warren Weckesser wrote:
> On Tue, Nov 15, 2011 at 10:48 AM, Andreas Müller 
> <amueller@ais.uni-bonn.de <mailto:amueller@ais.uni-bonn.de>> wrote:
>     On 11/15/2011 05:46 PM, Andreas Müller wrote:
>>     On 11/15/2011 04:28 PM, Bruce Southey wrote:
>>>     On 11/14/2011 10:05 AM, Andreas Müller wrote:
>>>>     On 11/14/2011 04:23 PM, David Cournapeau wrote:
>>>>>     On Mon, Nov 14, 2011 at 12:46 PM, Andreas Müller
>>>>>     <amueller@ais.uni-bonn.de>  <mailto:amueller@ais.uni-bonn.de>   wrote:
>>>>>>     Hi everybody.
>>>>>>     When I did some normalization using numpy, I noticed that numpy.std uses
>>>>>>     more ram than I was expecting.
>>>>>>     A quick google search gave me this:
>>>>>>     http://luispedro.org/software/ncreduce
>>>>>>     The site claims that std and other reduce operations are implemented
>>>>>>     naively with many temporaries.
>>>>>>     Is that true? And if so, is there a particular reason for that?
>>>>>>     This issues seems quite easy to fix.
>>>>>>     In particular the link I gave above provides code.
>>>>>     The code provided only implements a few special cases: being more
>>>>>     efficient in those cases only is indeed easy.
>>>>     I am particularly interested in the std function.
>>>>     Is this implemented as a separate function or an instantiation
>>>>     of a general reduce operations?
>>>>     _______________________________________________
>>>>     NumPy-Discussion mailing list
>>>>     NumPy-Discussion@scipy.org  <mailto:NumPy-Discussion@scipy.org>
>>>>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>     The'On-line algorithm'
>>>     (http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm)
>>>     <http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#On-line_algorithm>
>>>     could save you storage. I would presume if you know cython that
>>>     you can probably make it quick as well (to address the loop over
>>>     the data).
>>     My question was more along the lines of "why doesn't numpy do the
>>     online algorithm".
>     To be more precise, even not using the online version but
>     computing E(X^2) and E(X)^2 would be good.
>     It seems numpy centers the whole dataset. Otherwise I can't
>     explain why the memory needed should depend
>     on the number of examples.
> Yes, that is what it is doing.   See line 63 in the function _var(), 
> which is called by _std():
> https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py

Thanks for the clarification. I thought the function was somewhere in 
the C code -
don't know why.
I'll see if I can reformulate the function.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111115/f4fc5bf4/attachment.html 

More information about the NumPy-Discussion mailing list