[Numpy-discussion] What should np.ndarray.__contains__ do
Sebastian Berg
sebastian@sipsolutions....
Tue Feb 26 04:21:29 CST 2013
On Mon, 2013-02-25 at 16:33 +0000, Nathaniel Smith wrote:
> On Mon, Feb 25, 2013 at 3:10 PM, Sebastian Berg
> <sebastian@sipsolutions.net> wrote:
> > Hello all,
> >
> > currently the `__contains__` method or the `in` operator on arrays, does
> > not return what the user would expect when in the operation `a in b` the
> > `a` is not a single element (see "In [3]-[4]" below).
>
> True, I did not expect that!
>
<snip>
> The two approaches that I can see, and which generalize the behaviour
> of simple Python lists in natural ways, are:
>
> a) the left argument is coerced to a scalar of the appropriate type,
> then we check if that value appears anywhere in the array (basically
> raveling the right argument).
>
How did I misread that? I guess you mean element and never subarray
matching. Actually I am starting to think that is best. Subarray
matching may be useful, but would probably be better off inside its own
function.
That also might be best with object arrays, since it is difficult to
know if the user means a tuple as an element or a two element subarray,
unless you say "input is array-like", which is possible (or more
sensible) for a function.
That would mean just make the use cases that current give weird results
into errors. And maybe those errors hint to np.in1d and if numpy would
get it, some dedicated subarray matching function.
-- Sebastian
> b) for an array with shape (n1, n2, n3, ...), the left argument is
> treated as an array of shape (n2, n3, ...), and we check if that
> subarray (as a whole) appears anywhere in the array. Or in other
> words, 'A in B' is true iff there is some i such that
> np.array_equals(B[i], A).
>
> Question 1: are there any other sensible options that aren't on this list?
>
> Question 2: if not, then which should we choose? (Or we could choose
> both, I suppose, depending on what the left argument looks like.)
>
> Between these two options, I like (a) and don't like (b). The
> pretending-to-be-a-list-of-lists special case behaviour for
> multidimensional arrays is already weird and confusing, and besides,
> I'd expect equality comparison on arrays to use ==, not array_equals.
> So (b) feels pretty inconsistent with other numpy conventions to me.
>
> -n
>
> > I have opened an issue for it:
> > https://github.com/numpy/numpy/issues/3016#issuecomment-14045545
> >
> >
> > Regards,
> >
> > Sebastian
> >
> > In [1]: a = np.array([0, 2])
> >
> > In [2]: b = np.arange(10).reshape(5,2)
> >
> > In [3]: b
> > Out[3]:
> > array([[0, 1],
> > [2, 3],
> > [4, 5],
> > [6, 7],
> > [8, 9]])
> >
> > In [4]: a in b
> > Out[4]: True
> >
> > In [5]: (b == a).any()
> > Out[5]: True
> >
> > In [6]: (b == a).all(0).any() # the 0 could be multiple axes
> > Out[6]: False
> >
> > In [7]: a_2d = a[None,:]
> >
> > In [8]: a_2d in b # broadcast dimension means "any" -> True
> > Out[8]: True
> >
> > In [9]: [0, 1] in b[:,:1] # should not work (or be False, not True)
> > Out[9]: True
> >
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list