[Numpy-discussion] [Cdat-discussion] Arrays containing NaNs

Bruce Southey bsouthey@gmail....
Fri Jul 25 11:04:21 CDT 2008


Charles Doutriaux wrote:
> Hi Bruce,
>
> Thx for the reply, we're aware of this, basically the question was why 
> not mask NaN automatically when creating a nump.ma array?
>
> C.
>
> Bruce Southey wrote:
>   
>> Charles Doutriaux wrote:
>>   
>>     
>>> Hi Stephane,
>>>
>>> This is a good suggestion, I'm ccing the numpy list on this. Because I'm 
>>> wondering if it wouldn't be a better fit to do it directly at the 
>>> numpy.ma level.
>>>
>>> I'm sure they already thought about this (and 'inf' values as well) and 
>>> if they don't do it , there's probably some good reason we didn't think 
>>> of yet.
>>> So before i go ahead and do it in MV2 I'd like to know the reason why 
>>> it's not in numpy.ma, they are probably valid for MVs too.
>>>
>>> C.
>>>
>>> Stephane Raynaud wrote:
>>>   
>>>     
>>>       
>>>> Hi,
>>>>
>>>> how about automatically (or at least optionally) masking all NaN 
>>>> values when creating a MV array?
>>>>
>>>> On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene 
>>>> <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
>>>>
>>>>     Yup, this works. Thanks!
>>>>
>>>>     I guess it's time for me to dig deeper into numpy syntax and
>>>>     functions, now that CDAT is using the numpy core for array
>>>>     management...
>>>>
>>>>     Best,
>>>>
>>>>     Arthur
>>>>
>>>>
>>>>     Charles Doutriaux wrote:
>>>>
>>>>         Seems right to me,
>>>>
>>>>         Except that the syntax might scare a bit the new users :)
>>>>
>>>>         C.
>>>>
>>>>         Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
>>>>
>>>>             Hi,
>>>>
>>>>             I'm not sure if what I am about to suggest is a good idea
>>>>             or not, perhaps Charles will correct me if this is a bad
>>>>             idea for any reason.
>>>>
>>>>             Lets say you have a cdms variable called U with NaNs as
>>>>             the missing
>>>>              value. First we can replace the NaNs with 1e20:
>>>>
>>>>             U.data[numpy.where(numpy.isnan(U.data))] = 1e20
>>>>
>>>>             And remember to set the missing value of the variable
>>>>             appropriately:
>>>>
>>>>             U.setMissing(1e20)
>>>>
>>>>             I hope that helps, Andrew
>>>>
>>>>
>>>>
>>>>                 Hi Arthur,
>>>>
>>>>                 If i remember correctly the way i used to do it was:
>>>>                 a= MV2.greater(data,1.) b=MV2.less_equal(data,1)
>>>>                 c=MV2.logical_and(a,b) # Nan are the only one left
>>>>                 data=MV2.masked_where(c,data)
>>>>
>>>>                 BUT I believe numpy now has way to deal with nan I
>>>>                 believe it is numpy.nan_to_num But it replaces with 0
>>>>                 so it may not be what you
>>>>                  want
>>>>
>>>>                 C.
>>>>
>>>>
>>>>                 Arthur M. Greene wrote:
>>>>
>>>>                     A typical netcdf file is opened, and the single
>>>>                     variable extracted:
>>>>
>>>>
>>>>                                 fpr=cdms.open('prTS2p1_SEA_allmos.cdf')
>>>>                                 pr0=fpr('prcp') type(pr0)
>>>>
>>>>                     <class 'cdms2.tvariable.TransientVariable'>
>>>>
>>>>                     Masked values (indicating ocean in this case) show
>>>>                     up here as NaNs.
>>>>
>>>>
>>>>                                 pr0[0,-15:-5,0]
>>>>
>>>>                     prcp array([NaN NaN NaN NaN NaN NaN 0.37745094
>>>>                     0.3460784 0.21960783 0.19117641])
>>>>
>>>>                     So far this is all consistent. A map of the first
>>>>                     time step shows the proper land-ocean boundaries,
>>>>                     reasonable-looking values, and so on. But there
>>>>                     doesn't seem to be any way to mask
>>>>                      this array, so, e.g., an 'xy' average can be
>>>>                     computed (it
>>>>                     comes out all nans). NaN is not equal to anything
>>>>                     -- even
>>>>                     itself -- so there does not seem to be any
>>>>                     condition, among the
>>>>                      MV.masked_xxx options, that can be applied as a
>>>>                     test. Also, it
>>>>                      does not seem possible to compute seasonal averages,
>>>>                     anomalies, etc. -- they also produce just NaNs.
>>>>
>>>>                     The workaround I've come up with -- for now -- is
>>>>                     to first generate a new array of identical shape,
>>>>                     filled with 1.0E+20. One test I've found that can
>>>>                     detect NaNs is numpy.isnan:
>>>>
>>>>
>>>>                                 isnan(pr0[0,0,0])
>>>>
>>>>                     True
>>>>
>>>>                     So it is _possible_ to tediously loop through
>>>>                     every value in the old array, testing with isnan,
>>>>                     then copying to the new array if the test fails.
>>>>                     Then the axes have to be reset...
>>>>
>>>>                     isnan does not accept array arguments, so one
>>>>                     cannot do, e.g.,
>>>>
>>>>                     prmasked=MV.masked_where(isnan(pr0),pr0)
>>>>
>>>>                     The element-by-element conversion is quite slow.
>>>>                     (I'm still waiting for it to complete, in fact).
>>>>                     Any suggestions for dealing with NaN-infested data
>>>>                     objects?
>>>>
>>>>                     Thanks!
>>>>
>>>>                     AMG
>>>>
>>>>                     P.S. This is 5.0.0.beta, RHEL4.
>>>>
>>>>
>>>>     *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>>>     Arthur M. Greene, Ph.D.
>>>>     The International Research Institute for Climate and Society
>>>>     The Earth Institute, Columbia University, Lamont Campus
>>>>     Monell Building, 61 Route 9W, Palisades, NY  10964-8000 USA
>>>>     amg*at*iri-dot-columbia\dot\edu | http:// iri.columbia.edu
>>>>     *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>>>
>>>>
>>>>     -------------------------------------------------------------------------
>>>>     This SF.Net email is sponsored by the Moblin Your Move Developer's
>>>>     challenge
>>>>     Build the coolest Linux based applications with Moblin SDK & win
>>>>     great prizes
>>>>     Grand prize is a trip for two to an Open Source event anywhere in
>>>>     the world
>>>>     http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>>>>     <http:// moblin-contest.org/redirect.php?banner_id=100&url=/>
>>>>     _______________________________________________
>>>>     Cdat-discussion mailing list
>>>>     Cdat-discussion@lists.sourceforge.net
>>>>     <mailto:Cdat-discussion@lists.sourceforge.net>
>>>>     https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Stephane Raynaud
>>>> ------------------------------------------------------------------------
>>>>
>>>> -------------------------------------------------------------------------
>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>>>> Grand prize is a trip for two to an Open Source event anywhere in the world
>>>> http://  moblin-contest.org/redirect.php?banner_id=100&url=/
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Cdat-discussion mailing list
>>>> Cdat-discussion@lists.sourceforge.net
>>>> https://  lists.sourceforge.net/lists/listinfo/cdat-discussion
>>>>   
>>>>     
>>>>       
>>>>         
>>> _______________________________________________
>>> Numpy-discussion mailing list
>>> Numpy-discussion@scipy.org
>>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>   
>>>     
>>>       
>> Please look the various NumPy functions to ignore NaN like nansum(). See 
>> the NumPy example list 
>> (http:// www. scipy.org/Numpy_Example_List_With_Doc) for examples under 
>> nan or individual functions.
>>
>> To get the mean you can do something like:
>>
>> import numpy
>> x = numpy.array([2, numpy.nan, 1])
>> numpy.nansum(x)/(x.shape[0]-numpy.isnan(x).sum())
>> x_masked = numpy.ma.masked_where(numpy.isnan(x) , x)
>> x_masked.mean()
>>
>> The real advantage of masked arrays is that you have greater control 
>> over the filtering so you can also filter extreme values:
>>
>> y = numpy.array([2, numpy.nan, 1, 1000])
>> y_masked =numpy.ma.masked_where(numpy.isnan(y) , y)
>> y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked)
>> y_masked.mean()
>>
>> Regards
>> Bruce
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion@scipy.org
>> http:// projects.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>>   
>>     
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>   
You mean like doing:

import numpy
y=numpy.ma.MaskedArray([ 2., numpy.nan, 1., 1000.], numpy.isnan(y))

?

Bruce




More information about the Numpy-discussion mailing list