[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)

Joe Kington jkington@wisc....
Thu Sep 2 14:50:08 CDT 2010


Hi all,

I just wanted to check if this would be considered a bug.

numpy.histogram does not appear to preserve subclasses of ndarrays (e.g.
masked arrays).  This leads to considerable problems when working with
masked arrays. (As per this Stack Overflow
question<http://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy>
)

E.g.

import numpy as np
x = np.arange(100)
x = np.ma.masked_where(x > 30, x)

counts, bin_edges = np.histogram(x)

yields:
counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
bin_edges --> array([  0. ,   9.9,  19.8,  29.7,  39.6,  49.5,  59.4,
69.3,  79.2, 89.1,  99. ])

I would have expected histogram to ignore the masked portion of the data.
Is this a bug, or expected behavior?  I'll open a bug report, if it's not
expected behavior...

This would appear to be easily fixed by using asanyarray rather than asarray
within histogram.  E.g. this diff for numpy/lib/function_base.py
Index: function_base.py
===================================================================
--- function_base.py    (revision 8604)
+++ function_base.py    (working copy)
@@ -132,9 +132,9 @@

     """

-    a = asarray(a)
+    a = asanyarray(a)
     if weights is not None:
-        weights = asarray(weights)
+        weights = asanyarray(weights)
         if np.any(weights.shape != a.shape):
             raise ValueError(
                     'weights should have the same shape as a.')
@@ -156,7 +156,7 @@
             mx += 0.5
         bins = linspace(mn, mx, bins+1, endpoint=True)
     else:
-        bins = asarray(bins)
+        bins = asanyarray(bins)
         if (np.diff(bins) < 0).any():
             raise AttributeError(
                     'bins must increase monotonically.')

Thanks!
-Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100902/224ad6a7/attachment.html 


More information about the NumPy-Discussion mailing list