Mark Wiebe mwwiebe@gmail....
Wed Aug 24 21:29:44 CDT 2011

```On Wed, Aug 24, 2011 at 6:09 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

<snip>
>
> - Performance with skipna is a bit disappointing:
>
> In [52]: arr = np.random.randn(1e6)
> In [54]: arr.flags.maskna = True
> In [56]: arr[::2] = np.NA
> In [58]: timeit arr.sum(skipna=True)
> 100 loops, best of 3: 7.31 ms per loop
>
> this goes down to 2.12 ms if there are no NAs present.
>
> but:
>
> In [59]: import bottleneck as bn
> In [60]: arr = np.random.randn(1e6)
> In [61]: arr[::2] = np.nan
> In [62]: timeit bn.nansum(arr)
> 1000 loops, best of 3: 1.17 ms per loop
>
> do you have a sense if this gap can be closed? I assume you've been,
> as you should, focused on a correct implementation as opposed with
> squeezing out performance.
>

It looks like the spdiv example module I created for the C-API documentation
can give a bit of an idea for some performance expectations. The example has
no specialization for strides, and it operates exactly like np.divide except
it converts the output to NA instead of dividing by zero. It *always*
the example module:

In [1]: from spdiv_mod import spdiv

In [2]: arr = np.random.randn(1e6)

Since spdiv always creates an NA mask, this is comparing an NA-masked divide
with a regular NumPy divide:

In [3]: timeit spdiv(arr, 3.1)
100 loops, best of 3: 13.8 ms per loop

In [4]: timeit arr / 3.1
10 loops, best of 3: 11.4 ms per loop

Here, the divide is causing an NA mask to be created in the output, just
like in spdiv:

In [5]: timeit spdiv(arr, np.NA)
100 loops, best of 3: 4.72 ms per loop

In [6]: timeit arr / np.NA
100 loops, best of 3: 8.71 ms per loop

Here are the same tests, but after giving 'arr' an NA mask:

In [8]: timeit spdiv(arr, 3.1)
100 loops, best of 3: 14.2 ms per loop

In [9]: timeit arr / 3.1
10 loops, best of 3: 20.1 ms per loop

In [10]: timeit spdiv(arr, np.NA)
100 loops, best of 3: 4.02 ms per loop

In [11]: timeit arr / np.NA
100 loops, best of 3: 8.69 ms per loop

Another thought is to compare sum to count_nonzero, which is implemented in
a straightforward fashion without the masked wrapping mechanism that's in
the ufuncs.

n [12]: arr[::2] = np.NA

In [13]: np.count_nonzero(arr)
Out[13]: NA(dtype='int64')

In [14]: np.count_nonzero(arr, skipna=True)
Out[14]: 500000

In [15]: timeit np.count_nonzero(arr, skipna=True)
100 loops, best of 3: 5.86 ms per loop

In [16]: timeit np.sum(arr, skipna=True)
10 loops, best of 3: 16.1 ms per loop

In [17]: timeit np.count_nonzero(arr, skipna=False)
100 loops, best of 3: 1.85 ms per loop

In [18]: timeit np.sum(arr, skipna=False)
100 loops, best of 3: 1.86 ms per loop

Cheers,
Mark

>
> best,
> Wes
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110824/45f7cf35/attachment-0001.html
```