[Numpy-discussion] strange behavior of numpy.unique

Charles R Harris charlesr.harris@gmail....
Wed Nov 7 15:48:05 CST 2012


On Tue, Nov 6, 2012 at 7:52 PM, Warren Weckesser <warren.weckesser@gmail.com
> wrote:

>
>
> On Tue, Nov 6, 2012 at 8:27 PM, Phillip Feldman <
> phillip.m.feldman@gmail.com> wrote:
>
>> numpy.unique behaves as I would expect for small inputs like the
>> following:
>>
>> In [12]: x= [0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
>>
>> In [13]: unique(x, return_index=True)
>> Out[13]: (array([0, 1, 2, 3]), array([0, 2, 5, 9], dtype=int64))
>>
>> But, when I give it something larger, the return index values do not
>> always correspond to the first occurrences in the input. The documentation
>> is silent on the question of how the return index values are chosen when a
>> given element of x appears more than once. Either the documentation should
>> be
>> clarified, or better yet, the behavior should be changed.
>>
>
>
> In fact, it was changed (in the master branch on github) several months
> ago, but there has not yet been a release with the changes.  The sort
> method that np.unique passes to np.argsort is now 'mergesort', and the
> docstring states that the indices returned are for the first occurrences of
> the unique elements.  The new docstring is here:
> http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.unique.html#numpy.unique
>
> See
> https://github.com/numpy/numpy/commit/dbf235169ed3386b359caaa9217f5280bf1d6749for the commit, and
> https://github.com/numpy/numpy/blob/master/numpy/lib/arraysetops.py for
> the latest version of the source.
>
>
That change was backported to 1.6.2, but doesn't work for record/object
arrays. That oversight is fixed in master.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20121107/c29138a5/attachment.html 


More information about the NumPy-Discussion mailing list