[Numpy-discussion] NA masks for NumPy are ready to test

Wes McKinney wesmckinn@gmail....
Wed Aug 24 20:09:35 CDT 2011


On Wed, Aug 24, 2011 at 8:19 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey <bsouthey@gmail.com> wrote:
>>
>> Hi,
>> <snip>
>>
>> 2) Can the 'skipna' flag be added to the methods?
>> >>> a.sum(skipna=True)
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in <module>
>> TypeError: 'skipna' is an invalid keyword argument for this function
>> >>> np.sum(a,skipna=True)
>> nan
>
> I've added this now, as well. I think that finishes up the changes you
> suggested in this email which felt right to me.
> Cheers,
> Mark
>
>>
>> <snip>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

Sorry I haven't had a chance to have a tinker yet. My initial observations:

- I haven't decided whether this is a problem:

In [50]: arr = np.arange(100)

In [51]: arr[5:10] = np.NA
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/wesm/<ipython-input-51-7e07a94409e9> in <module>()
----> 1 arr[5:10] = np.NA

ValueError: Cannot set NumPy array values to NA values without first
enabling NA support in the array

I assume when you flip the maskna switch that a mask is created?

- Performance with skipna is a bit disappointing:

In [52]: arr = np.random.randn(1e6)
In [54]: arr.flags.maskna = True
In [56]: arr[::2] = np.NA
In [58]: timeit arr.sum(skipna=True)
100 loops, best of 3: 7.31 ms per loop

this goes down to 2.12 ms if there are no NAs present.

but:

In [59]: import bottleneck as bn
In [60]: arr = np.random.randn(1e6)
In [61]: arr[::2] = np.nan
In [62]: timeit bn.nansum(arr)
1000 loops, best of 3: 1.17 ms per loop

do you have a sense if this gap can be closed? I assume you've been,
as you should, focused on a correct implementation as opposed with
squeezing out performance.

best,
Wes


More information about the NumPy-Discussion mailing list