[NumPy-Tickets] [NumPy] #1447: Inconsistent behavior when indexing a masked array with a bool array
NumPy Trac
numpy-tickets@scipy....
Mon Apr 5 12:33:11 CDT 2010
#1447: Inconsistent behavior when indexing a masked array with a bool array
--------------------------------+-------------------------------------------
Reporter: nathanielpeterson | Owner: somebody
Type: defect | Status: closed
Priority: normal | Milestone:
Component: Other | Version: 1.3.0
Resolution: invalid | Keywords: bool-indexed masked array
--------------------------------+-------------------------------------------
Changes (by pierregm):
* status: new => closed
* resolution: => invalid
Comment:
On Apr 2, 2010, at 1:08 AM, Nathaniel Peterson wrote:
''Is this behavior of masked arrays intended, or is it a bug''?
It's not a bug, it's an unfortunate side effect of using boolean masked
arrays for indices. Don't. Instead, you should fill the masked arrays with
either True or False (depending on what you want).
Now, for some explanations:
{{{
import numpy as np
a=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
b=np.ma.fix_invalid(np.array([np.nan,-1,0,1]))
}}}
When using {{{ma.fix_invalid}}}, the nans and infs are masked and the
corresponding set to a default (1e+20 for floats). Thus, you have:
{{{
print a.data
[ 1.00000000e+20 -1.00000000e+00 0.00000000e+00 1.00000000e+00]
idx=(a==b)
}}}
Now, you compare two masked arrays. In practice, the arrays are first
filled with 0, compared, and the mask is created afterwards. In the
current case, we get a new masked array, whose first entry is masked
(because a[0] is masked), and because the two underlying ndarrays are
identical, the underlying ndarray of the result is {{{[True True True
True]}}}.
{{{
print(a[idx][3])
# 1.0
}}}
The fun starts now: you are using idx, a masked array, as indices. Because
the fancy indexing mechanism of numpy doesn't know how to process masked
arrays, their underlying ndarray are used instead. Consider a[idx]
equivalent to a[np.array(idx)]. Because {{{np.array(idx) == idx.data ==
[True True True True]}}}, a[idx] returns a, hence the (4,) shape.
''But if I change the first element of b from np.nan to 2.0 then
a[idx2] has shape (3,) despite np.alltrue(idx==idx2) being True:''
{{{
c=np.ma.fix_invalid(np.array([2.0,-1,0,1]))
idx2=(a==c)
}}}
So, c is a masked array without any masked values. When comparing a and c,
the arrays are once again filled with 0 before the comparison. The ndarray
underlying idx2 is therefore [False True True True], and the first item is
masked (still because a[0] is masked). If you use idx2 for indexing, it's
transformed to a ndarray, and you end up with the last three items of a
(hence the (3.) shape).
{{{
assert(np.alltrue(idx==idx2))
}}}
You compare the two masked arrays idx and idx2. Remember the filling with
0 that happens below the hood, so you end up comparing [False True True
True] and [False True True True] with np.alltrue, which of course returns
True...
Morale of the story: don't use masked arrays in fancy indexing, as you may
not get what you expect.
I hope it clarified the situation a bit, but don't hesitate to ask more
questions.
Cheers
P.
--
Ticket URL: <http://projects.scipy.org/numpy/ticket/1447#comment:1>
NumPy <http://projects.scipy.org/numpy>
My example project
More information about the NumPy-Tickets
mailing list