[Numpy-discussion] NA masks for NumPy are ready to test

Mark Wiebe mwwiebe@gmail....
Wed Aug 24 20:35:50 CDT 2011


On Wed, Aug 24, 2011 at 6:09 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

> On Wed, Aug 24, 2011 at 8:19 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> > On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey <bsouthey@gmail.com>
> wrote:
> >>
> >> Hi,
> >> <snip>
> >>
> >> 2) Can the 'skipna' flag be added to the methods?
> >> >>> a.sum(skipna=True)
> >> Traceback (most recent call last):
> >>  File "<stdin>", line 1, in <module>
> >> TypeError: 'skipna' is an invalid keyword argument for this function
> >> >>> np.sum(a,skipna=True)
> >> nan
> >
> > I've added this now, as well. I think that finishes up the changes you
> > suggested in this email which felt right to me.
> > Cheers,
> > Mark
> >
> >>
> >> <snip>
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
>
> Sorry I haven't had a chance to have a tinker yet. My initial observations:
>
> - I haven't decided whether this is a problem:
>
> In [50]: arr = np.arange(100)
>
> In [51]: arr[5:10] = np.NA
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> /home/wesm/<ipython-input-51-7e07a94409e9> in <module>()
> ----> 1 arr[5:10] = np.NA
>
> ValueError: Cannot set NumPy array values to NA values without first
> enabling NA support in the array
>
> I assume when you flip the maskna switch that a mask is created?
>

That's correct, it creates a fully exposed mask when you set the flag. The
thought was that having an assignment automatically add a mask to an array
would be a bad idea ("explicit vs implicit").


>
> - Performance with skipna is a bit disappointing:
>
> In [52]: arr = np.random.randn(1e6)
> In [54]: arr.flags.maskna = True
> In [56]: arr[::2] = np.NA
> In [58]: timeit arr.sum(skipna=True)
> 100 loops, best of 3: 7.31 ms per loop
>
> this goes down to 2.12 ms if there are no NAs present.
>

The alternating case is going to get the worst possible performance
currently. The masked loop has no specialization to the operation or data
type whatsoever yet, it simply calls the regular inner loop on the
appropriate runs of data.


> but:
>
> In [59]: import bottleneck as bn
> In [60]: arr = np.random.randn(1e6)
> In [61]: arr[::2] = np.nan
> In [62]: timeit bn.nansum(arr)
> 1000 loops, best of 3: 1.17 ms per loop
>
> do you have a sense if this gap can be closed? I assume you've been,
> as you should, focused on a correct implementation as opposed with
> squeezing out performance.
>

I've been focusing on a correct implementation while installing hooks in the
right places so that the performance can be improved later. For the
straightforward masked copying  code, I previously created a ticket
describing what needs to be done:

http://projects.scipy.org/numpy/ticket/1901

For element-wise ufuncs, the changes needed are similar, creating inner
loops specialized for masks. In doing these changes, I also figured out a
way to add the ability to more properly specialize the inner loops along the
lines of einsum without breaking ABI compatibility, so I set up the API as
required for this.

Thanks for taking a look,
Mark


>
> best,
> Wes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110824/57ecd5d0/attachment.html 


More information about the NumPy-Discussion mailing list