[Numpy-discussion] What should np.ndarray.__contains__ do

Todd toddrjen@gmail....
Mon Feb 25 11:01:53 CST 2013


The problem with b is that it breaks down if the two status have the same
dimensionality.

I think a better approach would be for

a in b

With a having n dimensions, it returns true if there is any subarray of b
that matches a along the last n dimensions.

So if a has 3 dimensions and b has 6, a in b is true iff there is any i, j,
k, m, n, p such that

a=b[i, j, k,
        m:m+a.shape[0],
        n:n+a.shape[1],
        p:p+a.shape[2]] ]

This isn't a very clear way to describe it, but I think it is  consistent
with the concept of a being a subarray of b even when they have the same
dimensionality.
On Feb 25, 2013 5:34 PM, "Nathaniel Smith" <njs@pobox.com> wrote:

> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
> <sebastian@sipsolutions.net> wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
>
> True, I did not expect that!
>
> > The first solution coming to mind might be checking `all()` for all
> > dimensions given in argument `a` (see line "In [5]" for a simplistic
> > example). This does not play too well with broadcasting however, but one
> > could maybe simply *not* broadcast at all (i.e. a.shape ==
> > b.shape[b.ndim-a.ndim:]) and raise an error/return False otherwise.
> >
> > On the other hand one could say broadcasting of `a` onto `b` should be
> > "any" along that dimension (see "In [8]"). The other way should maybe
> > raise an error though (see "In [9]" to understand what I mean).
> >
> > I think using broadcasting dimensions where `a` is repeated over `b` as
> > the dimensions to use "any" logic on is the most general way for numpy
> > to handle this consistently, while the other way around could be handled
> > with an `all` but to me makes so little sense that I think it should be
> > an error. Of course this is different to a list of lists, which gives
> > False in these cases, but arrays are not list of lists...
> >
> > As a side note, since for loop, etc.  use "for item in array", I do not
> > think that vectorizing along `a` as np.in1d does is reasonable. `in`
> > should return a single boolean.
>
> Python effectively calls bool() on the return value from __contains__,
> so reasonableness doesn't even come into it -- the only possible
> behaviours for `in` are to return True, False, or raise an exception.
>
> I admit that I don't actually really understand any of this discussion
> of broadcasting. in's semantics are, "is this scalar in this
> container"? (And the scalarness is enforced by Python, as per above.)
> So I think we should find some approach where the left argument is
> treated as a scalar.
>
> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
>
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
>
> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
>
> Question 1: are there any other sensible options that aren't on this list?
>
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
>
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
>
> -n
>
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> >        [2, 3],
> >        [4, 5],
> >        [6, 7],
> >        [8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130225/63c3dc89/attachment-0001.html 


More information about the NumPy-Discussion mailing list