[Numpy-discussion] unique() should return a sorted array

David Huard david.huard at gmail.com
Tue Jul 11 13:27:37 CDT 2006


Tim Hochberg wrote:
> My first question is: why? What's the attraction in returning a sorted
> answer here? Returning an unsorted array is potentially faster,
> depending on the algorithm chosen,  and sorting after the fact is
> trivial. If one was going to spend extra complexity on something, I'd
> think it would be better spent on preserving the input order.

There is a unique function in matlab that returns a sorted vector. I think a
lot of people will expect a numpy and matlab functions with identical names
to behave similarly.

If we want to preserve the input order, we'd have to choose a convention
about whose value's order is retained: do we keep the order of the first
value found or the last one ?

Here is the benchmark. Sorry Norbert for not including your code the first
time, it turns out that with Alain's suggestion, its the fastest one both
for lists and arrays.


x = rand(100000)*100
x = x.astype('i')
l = list(x)

For array x:

In [166]: timeit unique_alan(x) # with set instead of dict
100 loops, best of 3: 8.8 ms per loop

In [167]: timeit unique_norbert(x)
100 loops, best of 3: 8.8 ms per loop

In [168]: timeit unique_sasha(x)
100 loops, best of 3: 10.8 ms per loop

In [169]: timeit unique(x)
10 loops, best of 3: 50.4 ms per loop

In [170]: timeit unique1d(x)
10 loops, best of 3: 13.2 ms per loop


For list l:

In [196]: timeit unique_norbert(l)
10 loops, best of 3: 29 ms per loop

In [197]: timeit unique_alan(l)  # with set instead of dict
10 loops, best of 3: 14.5 ms per loop

In [193]: timeit unique(l)
10 loops, best of 3: 29.6 ms per loop


Note:
In Norbert function, setting sort=False for flattenable objects returns a
sorted array anyway. So I'd suggest to remove the sort keyword, sort if the
datatype is sortable, and don't sort if its not.

David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060711/14d88ba8/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unique_test.py
Type: text/x-python-script
Size: 993 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060711/14d88ba8/attachment.bin 


More information about the Numpy-discussion mailing list