[Numpy-discussion] Histogram does not preserve subclasses of ndarray (e.g. masked arrays)

Joe Kington jkington@wisc....
Thu Sep 2 18:33:47 CDT 2010


On Thu, Sep 2, 2010 at 5:31 PM, <josef.pktd@gmail.com> wrote:

> On Thu, Sep 2, 2010 at 3:50 PM, Joe Kington <jkington@wisc.edu> wrote:
> > Hi all,
> >
> > I just wanted to check if this would be considered a bug.
> >
> > numpy.histogram does not appear to preserve subclasses of ndarrays (e.g.
> > masked arrays).  This leads to considerable problems when working with
> > masked arrays. (As per this Stack Overflow question)
> >
> > E.g.
> >
> > import numpy as np
> > x = np.arange(100)
> > x = np.ma.masked_where(x > 30, x)
> >
> > counts, bin_edges = np.histogram(x)
> >
> > yields:
> > counts --> array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
> > bin_edges --> array([  0. ,   9.9,  19.8,  29.7,  39.6,  49.5,  59.4,
> > 69.3,  79.2, 89.1,  99. ])
> >
> > I would have expected histogram to ignore the masked portion of the data.
> > Is this a bug, or expected behavior?  I'll open a bug report, if it's not
> > expected behavior...
>
> If you want to ignore masked data it's just on extra function call
>
> histogram(m_arr.compressed())
>
> I don't think the fact that this makes an extra copy will be relevant,
> because I guess full masked array handling inside histogram will be a
> lot more expensive.
>
> Using asanyarray would also allow matrices in and other subtypes that
> might not be handled correctly by the histogram calculations.
>
> For anything else besides dropping masked observations, it would be
> necessary to figure out what the masked array definition of a
> histogram is, as Bruce pointed out.
>
> (Another interesting question would be if histogram handles nans
> correctly, searchsorted ???)
>
> Josef
>

Good points all around.  I'll skip the enhancement request.  Sorry for the
noise!
Thanks!
-Joe


>
> >
> > This would appear to be easily fixed by using asanyarray rather than
> asarray
> > within histogram.  E.g. this diff for numpy/lib/function_base.py
> > Index: function_base.py
> > ===================================================================
> > --- function_base.py    (revision 8604)
> > +++ function_base.py    (working copy)
> > @@ -132,9 +132,9 @@
> >
> >      """
> >
> > -    a = asarray(a)
> > +    a = asanyarray(a)
> >      if weights is not None:
> > -        weights = asarray(weights)
> > +        weights = asanyarray(weights)
> >          if np.any(weights.shape != a.shape):
> >              raise ValueError(
> >                      'weights should have the same shape as a.')
> > @@ -156,7 +156,7 @@
> >              mx += 0.5
> >          bins = linspace(mn, mx, bins+1, endpoint=True)
> >      else:
> > -        bins = asarray(bins)
> > +        bins = asanyarray(bins)
> >          if (np.diff(bins) < 0).any():
> >              raise AttributeError(
> >                      'bins must increase monotonically.')
> >
> > Thanks!
> > -Joe
> >
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20100902/a7872107/attachment.html 


More information about the NumPy-Discussion mailing list