[Numpy-discussion] What should np.ndarray.__contains__ do

Sebastian Berg sebastian@sipsolutions....
Mon Feb 25 11:49:13 CST 2013


On Mon, 2013-02-25 at 18:01 +0100, Todd wrote:
> The problem with b is that it breaks down if the two status have the
> same dimensionality. 
> 
> I think a better approach would be for 
> 
> a in b
> 
> With a having n dimensions, it returns true if there is any subarray
> of b that matches a along the last n dimensions.
> 
> So if a has 3 dimensions and b has 6, a in b is true iff there is any
> i, j, k, m, n, p such that
> 
> a=b[i, j, k,
>         m:m+a.shape[0], 
>         n:n+a.shape[1],
>         p:p+a.shape[2]] ]
> 
> This isn't a very clear way to describe it, but I think it is
> consistent with the concept of a being a subarray of b even when they
> have the same dimensionality. 
> 
Oh, great point. Guess this is the most general way, I completely missed
this option. Allows [0, 3] in [1, 0, 3, 5] to be true. I am not sure if
this kind of matching should be part of the in operator or not, though
on the other hand it would only do something reasonable when otherwise
an error would be thrown and it definitely is useful and compatible to
what anyone else might expect. 

> On Feb 25, 2013 5:34 PM, "Nathaniel Smith" <njs@pobox.com> wrote:
>         On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
>         <sebastian@sipsolutions.net> wrote:
>         > Hello all,
>         >
>         > currently the `__contains__` method or the `in` operator on
>         arrays, does
>         > not return what the user would expect when in the operation
>         `a in b` the
>         > `a` is not a single element (see "In [3]-[4]" below).
>         
>         True, I did not expect that!
>         
>         > The first solution coming to mind might be checking `all()`
>         for all
>         > dimensions given in argument `a` (see line "In [5]" for a
>         simplistic
>         > example). This does not play too well with broadcasting
>         however, but one
>         > could maybe simply *not* broadcast at all (i.e. a.shape ==
>         > b.shape[b.ndim-a.ndim:]) and raise an error/return False
>         otherwise.
>         >
>         > On the other hand one could say broadcasting of `a` onto `b`
>         should be
>         > "any" along that dimension (see "In [8]"). The other way
>         should maybe
>         > raise an error though (see "In [9]" to understand what I
>         mean).
>         >
>         > I think using broadcasting dimensions where `a` is repeated
>         over `b` as
>         > the dimensions to use "any" logic on is the most general way
>         for numpy
>         > to handle this consistently, while the other way around
>         could be handled
>         > with an `all` but to me makes so little sense that I think
>         it should be
>         > an error. Of course this is different to a list of lists,
>         which gives
>         > False in these cases, but arrays are not list of lists...
>         >
>         > As a side note, since for loop, etc.  use "for item in
>         array", I do not
>         > think that vectorizing along `a` as np.in1d does is
>         reasonable. `in`
>         > should return a single boolean.
>         
>         Python effectively calls bool() on the return value from
>         __contains__,
>         so reasonableness doesn't even come into it -- the only
>         possible
>         behaviours for `in` are to return True, False, or raise an
>         exception.
>         
>         I admit that I don't actually really understand any of this
>         discussion
>         of broadcasting. in's semantics are, "is this scalar in this
>         container"? (And the scalarness is enforced by Python, as per
>         above.)
>         So I think we should find some approach where the left
>         argument is
>         treated as a scalar.
>         
>         The two approaches that I can see, and which generalize the
>         behaviour
>         of simple Python lists in natural ways, are:
>         
>         a) the left argument is coerced to a scalar of the appropriate
>         type,
>         then we check if that value appears anywhere in the array
>         (basically
>         raveling the right argument).
>         
>         b) for an array with shape (n1, n2, n3, ...), the left
>         argument is
>         treated as an array of shape (n2, n3, ...), and we check if
>         that
>         subarray (as a whole) appears anywhere in the array. Or in
>         other
>         words, 'A in B' is true iff there is some i such that
>         np.array_equals(B[i], A).
>         
>         Question 1: are there any other sensible options that aren't
>         on this list?
>         
>         Question 2: if not, then which should we choose? (Or we could
>         choose
>         both, I suppose, depending on what the left argument looks
>         like.)
>         
>         Between these two options, I like (a) and don't like (b). The
>         pretending-to-be-a-list-of-lists special case behaviour for
>         multidimensional arrays is already weird and confusing, and
>         besides,
>         I'd expect equality comparison on arrays to use ==, not
>         array_equals.
>         So (b) feels pretty inconsistent with other numpy conventions
>         to me.
>         
>         -n
>         
>         > I have opened an issue for it:
>         >
>         https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
>         >
>         >
>         > Regards,
>         >
>         > Sebastian
>         >
>         > In [1]: a = np.array([0, 2])
>         >
>         > In [2]: b = np.arange(10).reshape(5,2)
>         >
>         > In [3]: b
>         > Out[3]:
>         > array([[0, 1],
>         >        [2, 3],
>         >        [4, 5],
>         >        [6, 7],
>         >        [8, 9]])
>         >
>         > In [4]: a in b
>         > Out[4]: True
>         >
>         > In [5]: (b == a).any()
>         > Out[5]: True
>         >
>         > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
>         > Out[6]: False
>         >
>         > In [7]: a_2d = a[None,:]
>         >
>         > In [8]: a_2d in b # broadcast dimension means "any" -> True
>         > Out[8]: True
>         >
>         > In [9]: [0, 1] in b[:,:1] # should not work (or be False,
>         not True)
>         > Out[9]: True
>         >
>         >
>         > _______________________________________________
>         > NumPy-Discussion mailing list
>         > NumPy-Discussion@scipy.org
>         > http://mail.scipy.org/mailman/listinfo/numpy-discussion
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion@scipy.org
>         http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list