[Numpy-discussion] [Cdat-discussion] Arrays containing NaNs
Bruce Southey
bsouthey@gmail....
Fri Jul 25 10:43:34 CDT 2008
Charles Doutriaux wrote:
> Hi Stephane,
>
> This is a good suggestion, I'm ccing the numpy list on this. Because I'm
> wondering if it wouldn't be a better fit to do it directly at the
> numpy.ma level.
>
> I'm sure they already thought about this (and 'inf' values as well) and
> if they don't do it , there's probably some good reason we didn't think
> of yet.
> So before i go ahead and do it in MV2 I'd like to know the reason why
> it's not in numpy.ma, they are probably valid for MVs too.
>
> C.
>
> Stephane Raynaud wrote:
>
>> Hi,
>>
>> how about automatically (or at least optionally) masking all NaN
>> values when creating a MV array?
>>
>> On Thu, Jul 24, 2008 at 11:43 PM, Arthur M. Greene
>> <amg@iri.columbia.edu <mailto:amg@iri.columbia.edu>> wrote:
>>
>> Yup, this works. Thanks!
>>
>> I guess it's time for me to dig deeper into numpy syntax and
>> functions, now that CDAT is using the numpy core for array
>> management...
>>
>> Best,
>>
>> Arthur
>>
>>
>> Charles Doutriaux wrote:
>>
>> Seems right to me,
>>
>> Except that the syntax might scare a bit the new users :)
>>
>> C.
>>
>> Andrew.Dawson@uea.ac.uk <mailto:Andrew.Dawson@uea.ac.uk> wrote:
>>
>> Hi,
>>
>> I'm not sure if what I am about to suggest is a good idea
>> or not, perhaps Charles will correct me if this is a bad
>> idea for any reason.
>>
>> Lets say you have a cdms variable called U with NaNs as
>> the missing
>> value. First we can replace the NaNs with 1e20:
>>
>> U.data[numpy.where(numpy.isnan(U.data))] = 1e20
>>
>> And remember to set the missing value of the variable
>> appropriately:
>>
>> U.setMissing(1e20)
>>
>> I hope that helps, Andrew
>>
>>
>>
>> Hi Arthur,
>>
>> If i remember correctly the way i used to do it was:
>> a= MV2.greater(data,1.) b=MV2.less_equal(data,1)
>> c=MV2.logical_and(a,b) # Nan are the only one left
>> data=MV2.masked_where(c,data)
>>
>> BUT I believe numpy now has way to deal with nan I
>> believe it is numpy.nan_to_num But it replaces with 0
>> so it may not be what you
>> want
>>
>> C.
>>
>>
>> Arthur M. Greene wrote:
>>
>> A typical netcdf file is opened, and the single
>> variable extracted:
>>
>>
>> fpr=cdms.open('prTS2p1_SEA_allmos.cdf')
>> pr0=fpr('prcp') type(pr0)
>>
>> <class 'cdms2.tvariable.TransientVariable'>
>>
>> Masked values (indicating ocean in this case) show
>> up here as NaNs.
>>
>>
>> pr0[0,-15:-5,0]
>>
>> prcp array([NaN NaN NaN NaN NaN NaN 0.37745094
>> 0.3460784 0.21960783 0.19117641])
>>
>> So far this is all consistent. A map of the first
>> time step shows the proper land-ocean boundaries,
>> reasonable-looking values, and so on. But there
>> doesn't seem to be any way to mask
>> this array, so, e.g., an 'xy' average can be
>> computed (it
>> comes out all nans). NaN is not equal to anything
>> -- even
>> itself -- so there does not seem to be any
>> condition, among the
>> MV.masked_xxx options, that can be applied as a
>> test. Also, it
>> does not seem possible to compute seasonal averages,
>> anomalies, etc. -- they also produce just NaNs.
>>
>> The workaround I've come up with -- for now -- is
>> to first generate a new array of identical shape,
>> filled with 1.0E+20. One test I've found that can
>> detect NaNs is numpy.isnan:
>>
>>
>> isnan(pr0[0,0,0])
>>
>> True
>>
>> So it is _possible_ to tediously loop through
>> every value in the old array, testing with isnan,
>> then copying to the new array if the test fails.
>> Then the axes have to be reset...
>>
>> isnan does not accept array arguments, so one
>> cannot do, e.g.,
>>
>> prmasked=MV.masked_where(isnan(pr0),pr0)
>>
>> The element-by-element conversion is quite slow.
>> (I'm still waiting for it to complete, in fact).
>> Any suggestions for dealing with NaN-infested data
>> objects?
>>
>> Thanks!
>>
>> AMG
>>
>> P.S. This is 5.0.0.beta, RHEL4.
>>
>>
>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>> Arthur M. Greene, Ph.D.
>> The International Research Institute for Climate and Society
>> The Earth Institute, Columbia University, Lamont Campus
>> Monell Building, 61 Route 9W, Palisades, NY 10964-8000 USA
>> amg*at*iri-dot-columbia\dot\edu | http://iri.columbia.edu
>> *^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*^*~*
>>
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in
>> the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> <http://moblin-contest.org/redirect.php?banner_id=100&url=/>
>> _______________________________________________
>> Cdat-discussion mailing list
>> Cdat-discussion@lists.sourceforge.net
>> <mailto:Cdat-discussion@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/cdat-discussion
>>
>>
>>
>>
>> --
>> Stephane Raynaud
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http:// moblin-contest.org/redirect.php?banner_id=100&url=/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Cdat-discussion mailing list
>> Cdat-discussion@lists.sourceforge.net
>> https:// lists.sourceforge.net/lists/listinfo/cdat-discussion
>>
>>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
Please look the various NumPy functions to ignore NaN like nansum(). See
the NumPy example list
(http://www.scipy.org/Numpy_Example_List_With_Doc) for examples under
nan or individual functions.
To get the mean you can do something like:
import numpy
x = numpy.array([2, numpy.nan, 1])
numpy.nansum(x)/(x.shape[0]-numpy.isnan(x).sum())
x_masked = numpy.ma.masked_where(numpy.isnan(x) , x)
x_masked.mean()
The real advantage of masked arrays is that you have greater control
over the filtering so you can also filter extreme values:
y = numpy.array([2, numpy.nan, 1, 1000])
y_masked =numpy.ma.masked_where(numpy.isnan(y) , y)
y_masked =numpy.ma.masked_where(y_masked > 100 , y_masked)
y_masked.mean()
Regards
Bruce
More information about the Numpy-discussion
mailing list