[SciPy-User] memory error - numpy mean - netcdf4

srean srean.list@gmail....
Thu Aug 25 00:53:59 CDT 2011


Since you are processing so many files, wouldnt it be better to update the
mean from every file and close/unload that netcdf, i.e. do it one at a time
?

You would not need to load the entore data set into memory and neither will
you need to maintain the sum (which might risk an overflow in some cases)

On Wed, Aug 24, 2011 at 11:39 PM, questions anon
<questions.anon@gmail.com>wrote:

> Thanks for your response.
> The error I am receiving is:
> *
> *
> *Traceback (most recent call last):*
> *  File "d:\documents and settings\SLBurns\Work\My
> Dropbox\Python_code\calculate_the_mean_across_multiple_netcdf_files_in_multiple_folders_add_shp_select_dirs.py",
> line 50, in <module>*
> *    big_array=N.ma.concatenate(all_TSFC)*
> *  File "C:\Python27\lib\site-packages\numpy\ma\core.py", line 6155, in
> concatenate*
> *    d = np.concatenate([getdata(a) for a in arrays], axis)*
> *MemoryError*
>
>
> I have tried ignoring TIME and only using one slice of lat and long
> (because they are the same for every file). I also tried entering
> the gc.collect() in the loop but nothing seemed to help.
> Anything else I could try? I am dealing with hundreds of files so maybe I
> need a whole different method to calculate the mean?
>
>
>
> On Wed, Aug 24, 2011 at 12:54 PM, Tim Supinie <tsupinie@gmail.com> wrote:
>
>> At what point in the program are you getting the error?  Is there a stack
>> trace?
>>
>> Pending the answers to those to questions, my first thought is to ask how
>> much data you're loading into memory?  How many files are there?  It's
>> possible that you're loading a whole bunch of data that you don't need, and
>> it's not getting cleared out by the garbage collector, which can generate
>> memory errors when you run out of memory.  Try removing as much data loading
>> as you can.  (Are you using TIME?  How big is each array you load in?)
>> Also, if the lats and lons in all the different files are the same, only
>> load the lats and lons from one file.  All these will not only help your
>> program use less memory, but help it run faster.
>>
>> Finally, if that doesn't work, use the gc module and run gc.collect()
>> after every loop iteration to make sure Python's cleaning up after itself
>> like it should.  I think the garbage collector might not always run during
>> loops, which can create problems when you're loading a whole bunch of unused
>> data.
>>
>> Tim
>>
>> On Tue, Aug 23, 2011 at 6:00 PM, questions anon <questions.anon@gmail.com
>> > wrote:
>>
>>> Hi All,
>>> I am receiving a memory error when I try to calculate the Numpy mean
>>> across many NetCDF files.
>>> Is there a way to fix this? The code I am using is below.
>>> Any feedback will be greatly appreciated.
>>>
>>>
>>> from netCDF4 import Dataset
>>> import matplotlib.pyplot as plt
>>> import numpy as N
>>> from mpl_toolkits.basemap import Basemap
>>> from netcdftime import utime
>>> from datetime import datetime
>>> import os
>>>
>>> MainFolder=r"E:/GriddedData/T_SFC/"
>>>
>>> all_TSFC=[]
>>> for (path, dirs, files) in os.walk(MainFolder):
>>>     for dir in dirs:
>>>         print dir
>>>     path=path+'/'
>>>     for ncfile in files:
>>>         if ncfile[-3:]=='.nc':
>>>             #print "dealing with ncfiles:", ncfile
>>>             ncfile=os.path.join(path,ncfile)
>>>             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>>>             TSFC=ncfile.variables['T_SFC'][4::24,:,:]
>>>             LAT=ncfile.variables['latitude'][:]
>>>             LON=ncfile.variables['longitude'][:]
>>>             TIME=ncfile.variables['time'][:]
>>>             fillvalue=ncfile.variables['T_SFC']._FillValue
>>>             ncfile.close()
>>>
>>>             #combine all TSFC to make one array for analyses
>>>             all_TSFC.append(TSFC)
>>>
>>> big_array=N.ma.concatenate(all_TSFC)
>>> #calculate the mean of the combined array
>>> Mean=big_array.mean(axis=0)
>>> print "the mean is", Mean
>>>
>>>
>>> #plot output summary stats
>>> map = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=-33,
>>>               llcrnrlon=139.0,urcrnrlon=151.0,lat_ts=0,resolution='i')
>>> map.drawcoastlines()
>>> map.drawstates()
>>> x,y=map(*N.meshgrid(LON,LAT))
>>> plt.title('TSFC Mean at 3pm')
>>> ticks=[-5,0,5,10,15,20,25,30,35,40,45,50]
>>> CS = map.contourf(x,y,Mean, cmap=plt.cm.jet)
>>> l,b,w,h =0.1,0.1,0.8,0.8
>>> cax = plt.axes([l+w+0.025, b, 0.025, h])
>>> plt.colorbar(CS,cax=cax, drawedges=True)
>>>
>>> plt.savefig((os.path.join(MainFolder, 'Mean.png')))
>>> plt.show()
>>> plt.close()
>>>
>>> print "end processing"
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> SciPy-User mailing list
>>> SciPy-User@scipy.org
>>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>>
>>>
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110825/1f3c0f6d/attachment-0001.html 


More information about the SciPy-User mailing list