[SciPy-User] How to ignore NaN values and -32767 in numpy array

Jonathan Rocher jrocher@enthought....
Wed Aug 17 03:11:00 CDT 2011


Hi,

you can create a mask cutting out all the values you don't want to consider
in your mean and compute the mean of the "masked array". To illustrate the
concept, look at:
In [1]: a = array([1,2,3,NaN,5])

In [4]: isnan(a)
Out[4]: array([False, False, False,  True, False], dtype=bool)

In [5]: ~isnan(a)
Out[5]: array([ True,  True,  True, False,  True], dtype=bool)

In [11]: mask = (~isnan(a)) & (a != 3)

In [12]: mask
Out[12]: array([ True,  True, False, False,  True], dtype=bool)

In [13]: a[mask]
Out[13]: array([ 1.,  2.,  5.])

In [14]: a[mask].mean()
Out[14]: 2.6666666666666665

In you code, you need to use something similar before you compute the mean.

Hope this helps,
Jonathan

On Wed, Aug 17, 2011 at 8:17 AM, questions anon <questions.anon@gmail.com>wrote:

> I am trying to run simple stats on a bunch of monthly netcdfs files with
> hourly temperature data. With help from this list I am able to loop through
> a calculate the mean, but in doing this I have discovered that there are a
> some hours that have no values or -32767. I am sure there are some cases
> where I could slice out the section (if I know where they are) but is there
> a way I could just ignore these hours and calculate the mean?
> I have found something called "numpy.isnan" but this does not seem to work.
>
> from netCDF4 import Dataset
> import matplotlib.pyplot as plt
> import numpy as N
> from mpl_toolkits.basemap import Basemap
> import os
>
> MainFolder=r"E:/temp_samples/"
>
> all_TSFC=[]
> for (path, dirs, files) in os.walk(MainFolder):
>     for dir in dirs:
>         print dir
>     path=path+'/'
>     for ncfile in files:
>         if ncfile[-3:]=='.nc':
>             ncfile=os.path.join(path,ncfile)
>             ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
>             TSFC=ncfile.variables['T_SFC'][:]
>             LAT=ncfile.variables['latitude'][:]
>             LON=ncfile.variables['longitude'][:]
>             TIME=ncfile.variables['time'][:]
>             fillvalue=ncfile.variables['T_SFC']._FillValue
>             ncfile.close()
>
> #combine all TSFC to make one array for analyses
>             all_TSFC.append(TSFC)
>
> big_array=N.concatenate(all_TSFC)
> Mean=big_array.mean(axis=0)
> print "the mean is", Mean
>
>  #plot output summary stats
> map = Basemap(projection='merc',llcrnrlat=-40,urcrnrlat=-33,
>
> llcrnrlon=139.0,urcrnrlon=151.0,lat_ts=0,resolution='i')
> x,y=map(*N.meshgrid(LON,LAT))
> CS = map.contourf(x,y,Mean, cmap=plt.cm.jet)
> l,b,w,h =0.1,0.1,0.8,0.8
> cax = plt.axes([l+w+0.025, b, 0.025, h])
> plt.colorbar(CS,cax=cax, drawedges=True)
>
> plt.show()
>
>
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


-- 
Jonathan Rocher, PhD
Scientific software developer
Enthought, Inc.
jrocher@enthought.com
1-512-536-1057
http://www.enthought.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20110817/f3c7f307/attachment.html 


More information about the SciPy-User mailing list