[Numpy-discussion] NA masks for NumPy are ready to test
Mark Wiebe
mwwiebe@gmail....
Thu Aug 18 16:43:17 CDT 2011
It's taken a lot of changes to get the NA mask support to its current point,
but the code ready for some testing now. You can read the work-in-progress
release notes here:
https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst
To try it out, check out the missingdata branch from my github account,
here, and build in the standard way:
https://github.com/m-paradox/numpy
The things most important to test are:
* Confirm that existing code still works correctly. I've tested against
SciPy and matplotlib.
* Confirm that the performance of code not using NA masks is the same or
better.
* Try to do computations with the NA values, find places they don't work
yet, and nominate unimplemented functionality important to you to be next on
the development list. The release notes have a preliminary list of
implemented/unimplemented functions.
* Report any crashes, build problems, or unexpected behaviors.
In addition to adding the NA mask, I've also added features and done a few
performance changes here and there, like letting reductions like sum take
lists of axes instead of being a single axis or all of them. These changes
affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and
http://projects.scipy.org/numpy/ticket/533.
Thanks!
Mark
Here's a small example run using NAs:
>>> import numpy as np
>>> np.__version__
'2.0.0.dev-8a5e2a1'
>>> a = np.random.rand(3,3,3)
>>> a.flags.maskna = True
>>> a[np.random.rand(3,3,3) < 0.5] = np.NA
>>> a
array([[[NA, NA, 0.11511708],
[ 0.46661454, 0.47565512, NA],
[NA, NA, NA]],
[[NA, 0.57860351, NA],
[NA, NA, 0.72012669],
[ 0.36582123, NA, 0.76289794]],
[[ 0.65322748, 0.92794386, NA],
[ 0.53745165, 0.97520989, 0.17515083],
[ 0.71219688, 0.5184328 , 0.75802805]]])
>>> np.mean(a, axis=-1)
array([[NA, NA, NA],
[NA, NA, NA],
[NA, 0.56260412, 0.66288591]])
>>> np.std(a, axis=-1)
array([[NA, NA, NA],
[NA, NA, NA],
[NA, 0.32710662, 0.10384331]])
>>> np.mean(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474:
RuntimeWarning: invalid value encountered in true_divide
um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0.11511708, 0.47113483, nan],
[ 0.57860351, 0.72012669, 0.56435958],
[ 0.79058567, 0.56260412, 0.66288591]])
>>> np.std(a, axis=-1, skipna=True)
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707:
RuntimeWarning: invalid value encountered in true_divide
um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe')
/home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730:
RuntimeWarning: invalid value encountered in true_divide
um.true_divide(ret, rcount, out=ret, casting='unsafe')
array([[ 0. , 0.00452029, nan],
[ 0. , 0. , 0.19853835],
[ 0.13735819, 0.32710662, 0.10384331]])
>>> np.std(a, axis=(1,2), skipna=True)
array([ 0.16786895, 0.15498008, 0.23811937])
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110818/31480027/attachment.html
More information about the NumPy-Discussion
mailing list