[Numpy-discussion] unique() should return a sorted array

Tim Hochberg tim.hochberg at cox.net
Tue Jul 11 11:02:21 CDT 2006

Norbert Nemec wrote:
> unique1d is based on ediff1d, so it really calculates many differences
> and compares those to 0.0
> This is inefficient, even though this is hidden by the general
> inefficiency of Python (It might be the reason for the two milliseconds,
> though)
> What is more: subtraction works only for numbers, while the various
> proposed versions use only comparison which works for any data type (as
> long as it can be sorted)
My first question is: why? What's the attraction in returning a sorted 
answer here? Returning an unsorted array is potentially faster, 
depending on the algorithm chosen,  and sorting after the fact is 
trivial. If one was going to spend extra complexity on something, I'd 
think it would be better spent on preserving the input order.

Second, some objects can be compared for equality and hashed, but not 
sorted (Python's complex number's come to mind). If one is going to 
worry about subtraction so as to keep things general, it makes sense to 
also avoid sorting as well  Sasha's slick algorithm not withstanding.

Third, I propose that whatever the outcome of the sorting issue, I would 
propose that unique have the same interface as the other structural 
array operations. That is:

unique(anarray, axis=0): 

The default axis=0 is for compatibility with the other, somewhat similar 
functions.  Axis=None would return the flattened, uniquified data, 
axis=# would uniquify the result along that axis.



> My own version tried to capture all possible cases that the current
> unique captures.
> Sasha's version only works for numpy arrays and has a problem for arrays
> with all identical entries.
> David's version only works for numpy arrays of types that can be
> converted to float.
> I would once more propose to use my own version as given before:
> def unique(arr,sort=True):
>     if hasattr(arr,'flatten'):
>         tmp = arr.flatten()
>         tmp.sort()
>         idx = concatenate([True],tmp[1:]!=tmp[:-1])
>         return tmp[idx]
>     else: # for compatibility:
>         set = {}
>         for item in inseq:
>             set[item] = None
>         if sort:
>             return asarray(sorted(set.keys()))
>        else:
>             return asarray(set.keys())
> Greetings,
> Norbert
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list