[SciPy-User] Fw: memory error - numpy mean - netcdf4

Phil Morefield philmorefield@yahoo....
Fri Aug 26 16:58:17 CDT 2011


import numpy as np
 
array = netcdf_variable[0]
 
for i in xrange(1, len(netcdf_variable) - 1, 1):
    array = np.true_divide(np.add(array, array[i]), 2.0)
 
 
Oops. That's not right. That's what I get for being hasty. Something like this maybe:
 
#########################################
import numpy as np
 
array = np.true_divide(netcdf_variable[0], len(netcdf_variable))
 
for i in xrange(1, len(netcdf_variable) - 1, 1):
    array = np.add(array, np.true_divide(array[i], len(netcdf_variable)))
#########################################
 


----- Forwarded Message -----
From: Phil Morefield <philmorefield@yahoo.com>
To: srean <srean.list@gmail.com>; SciPy Users List <scipy-user@scipy.org>
Sent: Friday, August 26, 2011 3:33 PM
Subject: Re: [SciPy-User] memory error - numpy mean - netcdf4


"If the values are integers then the running total may overflow."

That's a good point. Though you could just do this:

###################################
import numpy as np

array = netcdf_variable[0]

for i in xrange(1, len(netcdf_variable) - 1, 1):
    array = np.true_divide(np.add(array, array[i]), 2.0)
###################################
 
The formula you have written looks like you're collapsing everything into a single value. I think he's trying to average a bunch of 2D arrays into a single 2D array.
 
 
 
 
From: srean <srean.list@gmail.com>
To: Phil Morefield <philmorefield@yahoo.com>; SciPy Users List <scipy-user@scipy.org>
Sent: Friday, August 26, 2011 2:00 PM
Subject: Re: [SciPy-User] memory error - numpy mean - netcdf4




Finally, you're getting the MemoryError because you're trying to put an ginormous array into memory all at once. Your OS can't handle it. Just loop through each time step and keep a running total and counter. Then divide your total (which is an array) by your counter (which is an integer or float) and presto: you have your average. It's plenty fast, don't worry.
>

In fact one can even avoid keeping the running total. If the values are integers then the running total may overflow.

Say you have the mean \mu computed from N points and you have a new collection of m points whose mean is t.
Then the mean on the N + m points is:   \mu_{new} = \mu + (m)/(N+m) ( t - \mu) 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110826/68b86f0d/attachment-0001.html 


More information about the SciPy-User mailing list