[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
Thu Sep 2 15:47:57 CDT 2010
On 09/02/2010 02:50 PM, Joe Kington wrote:
> Hi all,
> I just wanted to check if this would be considered a bug.
> numpy.histogram does not appear to preserve subclasses of ndarrays
> (e.g. masked arrays). This leads to considerable problems when
> working with masked arrays. (As per this Stack Overflow question
> import numpy as np
> x = np.arange(100)
> x = np.ma.masked_where(x > 30, x)
> counts, bin_edges = np.histogram(x)
> counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
> bin_edges --> array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4,
> 69.3, 79.2, 89.1, 99. ])
> I would have expected histogram to ignore the masked portion of the
> data. Is this a bug, or expected behavior? I'll open a bug report,
> if it's not expected behavior...
> This would appear to be easily fixed by using asanyarray rather than
> asarray within histogram. E.g. this diff for numpy/lib/function_base.py
> Index: function_base.py
> --- function_base.py (revision 8604)
> +++ function_base.py (working copy)
> @@ -132,9 +132,9 @@
> - a = asarray(a)
> + a = asanyarray(a)
> if weights is not None:
> - weights = asarray(weights)
> + weights = asanyarray(weights)
> if np.any(weights.shape != a.shape):
> raise ValueError(
> 'weights should have the same shape as a.')
> @@ -156,7 +156,7 @@
> mx += 0.5
> bins = linspace(mn, mx, bins+1, endpoint=True)
> - bins = asarray(bins)
> + bins = asanyarray(bins)
> if (np.diff(bins) < 0).any():
> raise AttributeError(
> 'bins must increase monotonically.')
> NumPy-Discussion mailing list
I would not call it a bug as this a known 'feature' of functions that
use np.asarray(). You are welcome to file a enhancement bug but there
are some issues that need to be addressed.
Typical questions that come to mind are:
1) Should a user be warned that the input is a masked array?
2) Should histogram count the number of masked values?
3) What is the expected output when normed=True?
4) What type of array should be the weights and bin arguments?
5) What is the dimensions of the weight and bin arguments since it only
needs to have the number of bins?
6) If the input array is masked should the weight and bins arguments
also be masked arrays when applicable? If so, what happens if the masks
are in different locations between arrays?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion