[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)
Joe Kington
jkington@wisc....
Thu Sep 2 18:33:47 CDT 2010
On Thu, Sep 2, 2010 at 5:31 PM, <josef.pktd@gmail.com> wrote:
> On Thu, Sep 2, 2010 at 3:50 PM, Joe Kington <jkington@wisc.edu> wrote:
> > Hi all,
> >
> > I just wanted to check if this would be considered a bug.
> >
> > numpy.histogram does not appear to preserve subclasses of ndarrays (e.g.
> > masked arrays). This leads to considerable problems when working with
> > masked arrays. (As per this Stack Overflow question)
> >
> > E.g.
> >
> > import numpy as np
> > x = np.arange(100)
> > x = np.ma.masked_where(x > 30, x)
> >
> > counts, bin_edges = np.histogram(x)
> >
> > yields:
> > counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
> > bin_edges --> array([ 0. , 9.9, 19.8, 29.7, 39.6, 49.5, 59.4,
> > 69.3, 79.2, 89.1, 99. ])
> >
> > I would have expected histogram to ignore the masked portion of the data.
> > Is this a bug, or expected behavior? I'll open a bug report, if it's not
> > expected behavior...
>
> If you want to ignore masked data it's just on extra function call
>
> histogram(m_arr.compressed())
>
> I don't think the fact that this makes an extra copy will be relevant,
> because I guess full masked array handling inside histogram will be a
> lot more expensive.
>
> Using asanyarray would also allow matrices in and other subtypes that
> might not be handled correctly by the histogram calculations.
>
> For anything else besides dropping masked observations, it would be
> necessary to figure out what the masked array definition of a
> histogram is, as Bruce pointed out.
>
> (Another interesting question would be if histogram handles nans
> correctly, searchsorted ???)
>
> Josef
>
Good points all around. I'll skip the enhancement request. Sorry for the
noise!
Thanks!
-Joe
>
> >
> > This would appear to be easily fixed by using asanyarray rather than
> asarray
> > within histogram. E.g. this diff for numpy/lib/function_base.py
> > Index: function_base.py
> > ===================================================================
> > --- function_base.py (revision 8604)
> > +++ function_base.py (working copy)
> > @@ -132,9 +132,9 @@
> >
> > """
> >
> > - a = asarray(a)
> > + a = asanyarray(a)
> > if weights is not None:
> > - weights = asarray(weights)
> > + weights = asanyarray(weights)
> > if np.any(weights.shape != a.shape):
> > raise ValueError(
> > 'weights should have the same shape as a.')
> > @@ -156,7 +156,7 @@
> > mx += 0.5
> > bins = linspace(mn, mx, bins+1, endpoint=True)
> > else:
> > - bins = asarray(bins)
> > + bins = asanyarray(bins)
> > if (np.diff(bins) < 0).any():
> > raise AttributeError(
> > 'bins must increase monotonically.')
> >
> > Thanks!
> > -Joe
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100902/a7872107/attachment.html
More information about the NumPy-Discussion
mailing list