[SciPy-User] Question about boolean indexing in NumPy
Sat Nov 17 05:08:21 CST 2012
On Sat, 2012-11-17 at 11:19 +0100, Juan Luis Cano Rodríguez wrote:
> Hello everybody, I was trying to understand the use of boolean
> indexing in NumPy, and looking both in the User Guide and the
> Reference has driven me a bit confused.
> On the one hand, in the User Guide
> is stated that "Boolean arrays must be [...] broadcastable to the same
> shape", and then you have this example:
I guess this is probably not quite correct. In indexing you always give
indices from the first to last dimension, while broadcasting aligns from
the last to first dimension.
Under the hood, what happens works basically like this:
1. indices = np.nonzero(boolean_index)
2. result = array[indices]
Note that indices would be a tuple, so you are indexing multiple
dimensions. But you start with the first dimension not with the last.
This makes sense since it generalizes also if you would do `array[:,b]`.
then you got `array[:, b.nonzero, b.nonzero]` (assuming b is 2-d).
When thinking about broadcasting you could also think that a
1-dimensional axis is repeated. This does not happen, if you would want
that you would have to rather remove that axes and let normal indexing
do its job. Ie. not array[b], but array[:,b]. There is a small thing
that due to use of np.nonzero like functionality, the array shapes do
not have to match exactly (if out of bound values are all False), but I
personally think that one shouldn't rely on that.
I hope this clarifies things for you,
> >>> y = np.arange(35).reshape(5,7)
> >>> b = y > 20
> >>> b[:,5] # use a 1-D boolean that broadcasts with y
> array([False, False, False, True, True], dtype=bool)
> >>> y[b[:,5]]
> array([[21, 22, 23, 24, 25, 26, 27],
> [28, 29, 30, 31, 32, 33, 34]])
> but, indeed, these two arrays don't broadcast! Though they do if you
> transpose one of them:
> >>> np.broadcast_arrays(y, b[:, 5])
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/lib/python3.3/site-packages/numpy/lib/stride_tricks.py",
> line 94, in broadcast_arrays
> "incompatible dimensions on axis %r." % (axis,))
> ValueError: shape mismatch: two or more arrays have incompatible
> dimensions on axis 1.
> On the other hand, it is also stated that "Boolean arrays must be of
> the same shape as the array being indexed", but indeed this is not
> true, because according to the reference guide
> "This advanced indexing [...] is always equivalent to [...]
> x[obj.nonzero()]". So I can come up with an example:
> >>> y[0, :]
> array([0, 1, 2, 3, 4, 5, 6])
> >>> y[0, :].shape
> >>> np.array([True, False, True, False]).shape # Not the same shape!
> >>> y[0, :][np.array([True, False, True, False])] # But actually it
> works -> .nonzero()
> array([0, 2])
> Is there anything that I am misunderstanding? Can everyone shine the
> light on me in this topic?
> Thanks in advance, regards
> Juan Luis Cano
> SciPy-User mailing list
More information about the SciPy-User