[Numpy-discussion] [Cdat-discussion] Arrays containing NaNs
Charles Doutriaux
doutriaux1@llnl....
Fri Jul 25 12:09:56 CDT 2008
I mean not having to it myself.
data is a numpy array with NaN in it
masked_data = numpy.ma.array(data)
returns a masked array with a mask where NaN were in data
C.
Bruce Southey wrote:
> Charles Doutriaux wrote:
>
>> Hi Bruce,
>>
>> Thx for the reply, we're aware of this, basically the question was why
>> not mask NaN automatically when creating a nump.ma array?
>>
>> C.
>>
>> Bruce Southey wrote:
>>
>>
>>> Charles Doutriaux wrote:
>>>
>>>
>>>
>>>> Hi Stephane,
>>>>
>>>> This is a good suggestion, I'm ccing the numpy list on this. Because I'm
>>>> wondering if it wouldn't be a better fit to do it directly at the
>>>> numpy.ma level.
>>>>
>>>> I'm sure they already thought about this (and 'inf' values as well) and
>>>> if they don't do it , there's probably some good reason we didn't think
>>>> of yet.
>>>> So before i go ahead and do it in MV2 I'd like to know the reason why
>>>> it's not in numpy.ma, they are probably valid for MVs too.
>>>>
>>>> C.
>>>>
>>>> Stephane Raynaud wrote:
>>>>
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> how about automatically (or at least optionally) masking all NaN
>>>>> values when creating a MV array?
>>>>>
>>>>> On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene
>>>>> <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
>>>>>
>>>>> Yup, this works. Thanks!
>>>>>
>>>>> I guess it's time for me to dig deeper into numpy syntax and
>>>>> functions, now that CDAT is using the numpy core for array
>>>>> management...
>>>>>
>>>>> Best,
>>>>>
>>>>> Arthur
>>>>>
>>>>>
>>>>> Charles Doutriaux wrote:
>>>>>
>>>>> Seems right to me,
>>>>>
>>>>> Except that the syntax might scare a bit the new users :)
>>>>>
>>>>> C.
>>>>>
>>>>> Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm not sure if what I am about to suggest is a good idea
>>>>> or not, perhaps Charles will correct me if this is a bad
>>>>> idea for any reason.
>>>>>
>>>>> Lets say you have a cdms variable called U with NaNs as
>>>>> the missing
>>>>> value. First we can replace the NaNs with 1e20:
>>>>>
>>>>> U.data[numpy.where(numpy.isnan(U.data))] = 1e20
>>>>>
>>>>> And remember to set the missing value of the variable
>>>>> appropriately:
>>>>>
>>>>> U.setMissing(1e20)
>>>>>
>>>>> I hope that helps, Andrew
>>>>>
>>>>>
>>>>>
>>>>> Hi Arthur,
>>>>>
>>>>> If i remember correctly the way i used to do it was:
>>>>> a= MV2.greater(data,1.) b=MV2.less_equal(data,1)
>>>>> c=MV2.logical_and(a,b) # Nan are the only one left
>>>>> data=MV2.masked_where(c,data)
>>>>>
>>>>> BUT I believe numpy now has way to deal with nan I
>>>>> believe it is numpy.nan_to_num But it replaces with 0
>>>>> so it may not be what you
>>>>> want
>>>>>
>>>>> C.
>>>>>
>>>>>
>>>>> Arthur M. Greene wrote:
>>>>>
>>>>> A typical netcdf file is opened, and the single
>>>>> variable extracted:
>>>>>
>>>>>
>>>>> fpr=cdms.open('prTS2p1_SEA_allmos.cdf')
>>>>> pr0=fpr('prcp') type(pr0)
>>>>>
>>>>> <class 'cdms2.tvariable.TransientVariable'>
>>>>>
>>>>> Masked values (indicating ocean in this case) show
>>>>> up here as NaNs.
>>>>>
>>>>>
>>>>> pr0[0,-15:-5,0]
>>>>>
>>>>> prcp array([NaN NaN NaN NaN NaN NaN 0.37745094
>>>>> 0.3460784 0.21960783 0.19117641])
>>>>>
>>>>> So far this is all consistent. A map of the first
>>>>> time step shows the proper land-ocean boundaries,
>>>>> reasonable-looking values, and so on. But there
>>>>> doesn't seem to be any way to mask
>>>>> this array, so, e.g., an 'xy' average can be
>>>>> computed (it
>>>>> comes out all nans). NaN is not equal to anything
>>>>> -- even
>>>>> itself -- so there does not seem to be any
>>>>> condition, among the
>>>>> MV.masked_xxx options, that can be applied as a
>>>>> test. Also, it
>>>>> does not seem possible to compute seasonal averages,
>>>>> anomalies, etc. -- they also produce just NaNs.
>>>>>
>>>>> The workaround I've come up with -- for now -- is
>>>>> to first generate a new array of identical shape,
>>>>> filled with 1.0E+20. One test I've found that can
>>>>> detect NaNs is numpy.isnan:
>>>>>
>>>>>
>>>>> isnan(pr0[0,0,0])
>>>>>
>>>>> True
>>>>>
>>>>> So it is _possible_ to tediously loop through
>>>>> every value in the old array, testing with isnan,
>>>>> then copying to the new array if the test fails.
>>>>> Then the axes have to be reset...
>>>>>
>>>>> isnan does not accept array arguments, so one
>>>>> cannot do, e.g.,
>>>>>
>>>>> prmasked=MV.masked_where(isnan(pr0),pr0)
>>>>>
>>>>> The element-by-element conversion is quite slow.
>>>>> (I'm still waiting for it to complete, in fact).
>>>>> Any suggestions for dealing with NaN-infested data
>>>>> objects?
>>>>>
>>>>> Thanks!
>>>>>
>>>>> AMG
>>>>>
>>>>> P.S. This is 5.0.0.beta, RHEL4.
>>>>>
>>>>>
>>>>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>>>> Arthur M. Greene, Ph.D.
>>>>> The International Research Institute for Climate and Society
>>>>> The Earth Institute, Columbia University, Lamont Campus
>>>>> Monell Building, 61 Route 9W, Palisades, NY 10964-8000 USA
>>>>> amg*at*iri-dot-columbia\dot\edu | http:// iri.columbia.edu
>>>>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>>>>
>>>>>
>>>>> -------------------------------------------------------------------------
>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>> challenge
>>>>> Build the coolest Linux based applications with Moblin SDK & win
>>>>> great prizes
>>>>> Grand prize is a trip for two to an Open Source event anywhere in
>>>>> the world
>>>>> http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>> <http:// moblin-contest.org/redirect.php?banner_id=100&url=/>
>>>>> _______________________________________________
>>>>> Cdat-discussion mailing list
>>>>> Cdat-discussion@lists.sourceforge.net
>>>>> <mailto:Cdat-discussion@lists.sourceforge.net>
>>>>> https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Stephane Raynaud
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> -------------------------------------------------------------------------
>>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>>>>> Grand prize is a trip for two to an Open Source event anywhere in the world
>>>>> http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Cdat-discussion mailing list
>>>>> Cdat-discussion@lists.sourceforge.net
>>>>> https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Numpy-discussion mailing list
>>>> Numpy-discussion@scipy.org
>>>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>>
>>>>
>>>>
>>> Please look the various NumPy functions to ignore NaN like nansum(). See
>>> the NumPy example list
>>> (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under
>>> nan or individual functions.
>>>
>>> To get the mean you can do something like:
>>>
>>> import numpy
>>> x = numpy.array([2, numpy.nan, 1])
>>> numpy.nansum(x)/(x.shape[0]-numpy.isnan(x).sum())
>>> x_masked = numpy.ma.masked_where(numpy.isnan(x) , x)
>>> x_masked.mean()
>>>
>>> The real advantage of masked arrays is that you have greater control
>>> over the filtering so you can also filter extreme values:
>>>
>>> y = numpy.array([2, numpy.nan, 1, 1000])
>>> y_masked =numpy.ma.masked_where(numpy.isnan(y) , y)
>>> y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked)
>>> y_masked.mean()
>>>
>>> Regards
>>> Bruce
>>> _______________________________________________
>>> Numpy-discussion mailing list
>>> Numpy-discussion@scipy.org
>>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>>
>>>
>>>
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>
> You mean like doing:
>
> import numpy
> y=numpy.ma.MaskedArray([ 2., numpy.nan, 1., 1000.], numpy.isnan(y))
>
> ?
>
> Bruce
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
More information about the Numpy-discussion
mailing list