Joe Kington
jkington@wisc....
Thu Sep 2 14:50:08 CDT 2010
Hi all,
I just wanted to check if this would be considered a bug.
numpy.histogram does not appear to preserve subclasses of ndarrays (e.g.
masked arrays). This leads to considerable problems when working with
masked arrays. (As per this Stack Overflow
question<http://stackoverflow.com/questions/3610040/how-to-create-the-histogram-of-an-array-with-masked-values-in-numpy>
)
E.g.
import numpy as np
x = np.arange(100)
x = np.ma.masked_where(x > 30, x)
counts, bin_edges = np.histogram(x)
yields:
counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
bin_edges --> array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4,
69.3, 79.2, 89.1, 99. ])
I would have expected histogram to ignore the masked portion of the data.
Is this a bug, or expected behavior? I'll open a bug report, if it's not
expected behavior...
This would appear to be easily fixed by using asanyarray rather than asarray
within histogram. E.g. this diff for numpy/lib/function_base.py
Index: function_base.py
===================================================================
--- function_base.py (revision 8604)
+++ function_base.py (working copy)
@@ -132,9 +132,9 @@
"""
- a = asarray(a)
+ a = asanyarray(a)
if weights is not None:
- weights = asarray(weights)
+ weights = asanyarray(weights)
if np.any(weights.shape != a.shape):
raise ValueError(
'weights should have the same shape as a.')
@@ -156,7 +156,7 @@
mx += 0.5
bins = linspace(mn, mx, bins+1, endpoint=True)
else:
- bins = asarray(bins)
+ bins = asanyarray(bins)
if (np.diff(bins) < 0).any():
raise AttributeError(
'bins must increase monotonically.')
Thanks!
-Joe
