[Numpy-discussion] What should np.ndarray.__contains__ do
Mon Feb 25 10:33:49 CST 2013
On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
> Hello all,
> currently the `__contains__` method or the `in` operator on arrays, does
> not return what the user would expect when in the operation `a in b` the
> `a` is not a single element (see "In -" below).
True, I did not expect that!
> The first solution coming to mind might be checking `all()` for all
> dimensions given in argument `a` (see line "In " for a simplistic
> example). This does not play too well with broadcasting however, but one
> could maybe simply *not* broadcast at all (i.e. a.shape ==
> b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise.
> On the other hand one could say broadcasting of `a` onto `b` should be
> "any" along that dimension (see "In "). The other way should maybe
> raise an error though (see "In " to understand what I mean).
> I think using broadcasting dimensions where `a` is repeated over `b` as
> the dimensions to use "any" logic on is the most general way for numpy
> to handle this consistently, while the other way around could be handled
> with an `all` but to me makes so little sense that I think it should be
> an error. Of course this is different to a list of lists, which gives
> False in these cases, but arrays are not list of lists...
> As a side note, since for loop, etc. use "for item in array", I do not
> think that vectorizing along `a` as np.in1d does is reasonable. `in`
> should return a single boolean.
Python effectively calls bool() on the return value from __contains__,
so reasonableness doesn't even come into it -- the only possible
behaviours for `in` are to return True, False, or raise an exception.
I admit that I don't actually really understand any of this discussion
of broadcasting. in's semantics are, "is this scalar in this
container"? (And the scalarness is enforced by Python, as per above.)
So I think we should find some approach where the left argument is
treated as a scalar.
The two approaches that I can see, and which generalize the behaviour
of simple Python lists in natural ways, are:
a) the left argument is coerced to a scalar of the appropriate type,
then we check if that value appears anywhere in the array (basically
raveling the right argument).
b) for an array with shape (n1, n2, n3, ...), the left argument is
treated as an array of shape (n2, n3, ...), and we check if that
subarray (as a whole) appears anywhere in the array. Or in other
words, 'A in B' is true iff there is some i such that
Question 1: are there any other sensible options that aren't on this list?
Question 2: if not, then which should we choose? (Or we could choose
both, I suppose, depending on what the left argument looks like.)
Between these two options, I like (a) and don't like (b). The
pretending-to-be-a-list-of-lists special case behaviour for
multidimensional arrays is already weird and confusing, and besides,
I'd expect equality comparison on arrays to use ==, not array_equals.
So (b) feels pretty inconsistent with other numpy conventions to me.
> I have opened an issue for it:
> In : a = np.array([0, 2])
> In : b = np.arange(10).reshape(5,2)
> In : b
> array([[0, 1],
> [2, 3],
> [4, 5],
> [6, 7],
> [8, 9]])
> In : a in b
> Out: True
> In : (b == a).any()
> Out: True
> In : (b == a).all(0).any() # the 0 could be multiple axes
> Out: False
> In : a_2d = a[None,:]
> In : a_2d in b # broadcast dimension means "any" -> True
> Out: True
> In : [0, 1] in b[:,:1] # should not work (or be False, not True)
> Out: True
> NumPy-Discussion mailing list
More information about the NumPy-Discussion