[Numpy-discussion] NA masks for NumPy are ready to test

Mark Wiebe mwwiebe@gmail....
Thu Aug 18 16:43:17 CDT 2011


It's taken a lot of changes to get the NA mask support to its current point,
but the code ready for some testing now. You can read the work-in-progress
release notes here:

https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst

To try it out, check out the missingdata branch from my github account,
here, and build in the standard way:

https://github.com/m-paradox/numpy

The things most important to test are:

* Confirm that existing code still works correctly. I've tested against
SciPy and matplotlib.
* Confirm that the performance of code not using NA masks is the same or
better.
* Try to do computations with the NA values, find places they don't work
yet, and nominate unimplemented functionality important to you to be next on
the development list. The release notes have a preliminary list of
implemented/unimplemented functions.
* Report any crashes, build problems, or unexpected behaviors.

In addition to adding the NA mask, I've also added features and done a few
performance changes here and there, like letting reductions like sum take
lists of axes instead of being a single axis or all of them. These changes
affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and
http://projects.scipy.org/numpy/ticket/533.

Thanks!
Mark

Here's a small example run using NAs:

>>> import numpy as np
>>> np.__version__
'2.0.0.dev-8a5e2a1'
>>> a = np.random.rand(3,3,3)
>>> a.flags.maskna = True
>>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>>> a
array([[[NA, NA,  0.11511708],
        [ 0.46661454,  0.47565512, NA],
        [NA, NA, NA]],

       [[NA,  0.57860351, NA],
        [NA, NA,  0.72012669],
        [ 0.36582123, NA,  0.76289794]],

       [[ 0.65322748,  0.92794386, NA],
        [ 0.53745165,  0.97520989,  0.17515083],
        [ 0.71219688,  0.5184328 ,  0.75802805]]])
>>> np.mean(a, axis=-1)
array([[NA, NA, NA],
       [NA, NA, NA],
       [NA,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1)
array([[NA, NA, NA],
       [NA, NA, NA],
       [NA,  0.32710662,  0.10384331]])
>>> np.mean(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
RuntimeWarning: invalid value encountered in true_divide
  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.11511708,  0.47113483,         nan],
       [ 0.57860351,  0.72012669,  0.56435958],
       [ 0.79058567,  0.56260412,  0.66288591]])
>>> np.std(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
RuntimeWarning: invalid value encountered in true_divide
  um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
RuntimeWarning: invalid value encountered in true_divide
  um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.        ,  0.00452029,         nan],
       [ 0.        ,  0.        ,  0.19853835],
       [ 0.13735819,  0.32710662,  0.10384331]])
>>> np.std(a, axis=(1,2), skipna=True)
array([ 0.16786895,  0.15498008,  0.23811937])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110818/31480027/attachment.html 


More information about the NumPy-Discussion mailing list