[Numpy-discussion] strange behavior of numpy.unique

josef.pktd@gmai... josef.pktd@gmai...
Wed Nov 7 11:24:04 CST 2012


On Tue, Nov 6, 2012 at 9:52 PM, Warren Weckesser
<warren.weckesser@gmail.com> wrote:
>
>
> On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman
> <phillip.m.feldman@gmail.com> wrote:
>>
>> numpy.unique behaves as I would expect for small inputs like the
>> following:
>>
>> In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
>>
>> In [13]: unique(x, return_index=True)
>> Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64))
>>
>> But, when I give it something larger, the return index values do not
>> always correspond to the first occurrences in the input. The documentation
>> is silent on the question of how the return index values are chosen when a
>> given element of x appears more than once. Either the documentation should
>> be
>> clarified, or better yet, the behavior should be changed.
>
>
>
> In fact, it was changed (in the master branch on github) several months ago,
> but there has not yet been a release with the changes.  The sort method that
> np.unique passes to np.argsort is now 'mergesort', and the docstring states
> that the indices returned are for the first occurrences of the unique
> elements.  The new docstring is here:
> http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique
>
> See
> https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749
> for the commit, and
> https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for the
> latest version of the source.

I think it's in 1.6.2 and it broke return_index for structured dtypes, IIRC.

Josef


>
> Warren
>
>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list