[Numpy-discussion] NA masks in the next numpy release?
Charles R Harris
charlesr.harris@gmail....
Mon Oct 24 12:48:40 CDT 2011
On Mon, Oct 24, 2011 at 11:12 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
> On Mon, Oct 24, 2011 at 10:54 AM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
> >
> >
> > On Mon, Oct 24, 2011 at 8:40 AM, Charles R Harris
> > <charlesr.harris@gmail.com> wrote:
> >>
> >>
> >> On Sun, Oct 23, 2011 at 11:23 PM, Wes McKinney <wesmckinn@gmail.com>
> >> wrote:
> >>>
> >>> On Sun, Oct 23, 2011 at 8:07 PM, Eric Firing <efiring@hawaii.edu>
> wrote:
> >>> > On 10/23/2011 12:34 PM, Nathaniel Smith wrote:
> >>> >
> >>> >> like. And in this case I do think we can come up with an API that
> will
> >>> >> make everyone happy, but that Mark's current API probably can't be
> >>> >> incrementally evolved to become that API.)
> >>> >>
> >>> >
> >>> > No one could object to coming up with an API that makes everyone
> happy,
> >>> > provided that it actually gets coded up, tested, and is found to be
> >>> > fast
> >>> > and maintainable. When you say the API probably can't be evolved, do
> >>> > you mean that the underlying implementation also has to be redone?
> And
> >>> > if so, who will do it, and when?
> >>> >
> >>> > Eric
> >>> > _______________________________________________
> >>> > NumPy-Discussion mailing list
> >>> > NumPy-Discussion@scipy.org
> >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >>> >
> >>>
> >>> I personally am a bit apprehensive as I am worried about the masked
> >>> array abstraction "leaking" through to users of pandas, something
> >>> which I simply will not accept (why I decided against using numpy.ma
> >>> early on, that + performance problems). Basically if having an
> >>> understanding of masked arrays is a prerequisite for using pandas, the
> >>> whole thing is DOA to me as it undermines the usability arguments I've
> >>> been making about switching to Python (from R) for data analysis and
> >>> statistical computing.
> >>
> >> The missing data functionality looks far more like R than numpy.ma.
> >>
> >
> > For instance
> >
> > In [8]: a = arange(5, maskna=1)
> >
> > In [9]: a[2] = np.NA
> >
> > In [10]: a.mean()
> > Out[10]: NA(dtype='float64')
> >
> > In [11]: a.mean(skipna=1)
> > Out[11]: 2.0
> >
> > In [12]: a = arange(5)
> >
> > In [13]: b = a.view(maskna=1)
> >
> > In [14]: a.mean()
> > Out[14]: 2.0
> >
> > In [15]: b[2] = np.NA
> >
> > In [16]: b.mean()
> > Out[16]: NA(dtype='float64')
> >
> > In [17]: b.mean(skipna=1)
> > Out[17]: 2.0
> >
> > Chuck
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> I don't really agree with you.
>
> some sample R code
>
> > arr <- rnorm(10)
> > arr[5:8] <- NA
> > arr
> [1] 0.6451460 -1.1285552 0.6869828 0.4018868 NA NA
> [7] NA NA 0.3322803 -1.9201257
>
> In your examples you had to pass maskna=True-- I suppose that my only
> recourse would be to make sure that every array inside a DataFrame,
> for example, has maskna=True set. I'll have to look in more detail and
> see if it's feasible/desirable. There's a memory cost to pay, but you
> can't get the functionality for free. I may just end up sticking with
> NaN as it's worked pretty well so far the last few years-- it's an
> impure solution but one with reasonably good performance
> characteristics in the places that matter.
>
It might useful to have a way of setting global defaults, or something like
a with statement. These are the sort of things that can be adjusted based on
experience. For instance, I'm thinking skipna=1 is the natural default for
the masked arrays.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111024/332c75d1/attachment.html
More information about the NumPy-Discussion
mailing list