[Numpy-discussion] unique() should return a sorted array

Tim Hochberg tim.hochberg at cox.net
Tue Jul 11 11:06:23 CDT 2006


Tim Hochberg wrote:
> Norbert Nemec wrote:
>   
>> unique1d is based on ediff1d, so it really calculates many differences
>> and compares those to 0.0
>>
>> This is inefficient, even though this is hidden by the general
>> inefficiency of Python (It might be the reason for the two milliseconds,
>> though)
>>
>> What is more: subtraction works only for numbers, while the various
>> proposed versions use only comparison which works for any data type (as
>> long as it can be sorted)
>>   
>>     
> My first question is: why? What's the attraction in returning a sorted 
> answer here? Returning an unsorted array is potentially faster, 
> depending on the algorithm chosen,  and sorting after the fact is 
> trivial. If one was going to spend extra complexity on something, I'd 
> think it would be better spent on preserving the input order.
>
> Second, some objects can be compared for equality and hashed, but not 
> sorted (Python's complex number's come to mind). If one is going to 
> worry about subtraction so as to keep things general, it makes sense to 
> also avoid sorting as well  Sasha's slick algorithm not withstanding.
>
> Third, I propose that whatever the outcome of the sorting issue, I would 
> propose that unique have the same interface as the other structural 
> array operations. That is:
>
> unique(anarray, axis=0): 
>    ...
>
> The default axis=0 is for compatibility with the other, somewhat similar 
> functions.  Axis=None would return the flattened, uniquified data, 
> axis=# would uniquify the result along that axis.
>   
Hmmm. Of course that precludes it returning an actual array for 
axis!=None. That might be considered suboptimal...

-tim


> Regards,
>
> -tim
>
>   
>> My own version tried to capture all possible cases that the current
>> unique captures.
>>
>> Sasha's version only works for numpy arrays and has a problem for arrays
>> with all identical entries.
>>
>> David's version only works for numpy arrays of types that can be
>> converted to float.
>>
>> I would once more propose to use my own version as given before:
>>
>> def unique(arr,sort=True):
>>     if hasattr(arr,'flatten'):
>>         tmp = arr.flatten()
>>         tmp.sort()
>>         idx = concatenate([True],tmp[1:]!=tmp[:-1])
>>         return tmp[idx]
>>     else: # for compatibility:
>>         set = {}
>>         for item in inseq:
>>             set[item] = None
>>         if sort:
>>             return asarray(sorted(set.keys()))
>>        else:
>>             return asarray(set.keys())
>>
>>
>> Greetings,
>> Norbert
>>
>>
>>
>> -------------------------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services, security?
>> Get stuff done quickly with pre-integrated technology to make your job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
>>   
>>     
>
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>   






More information about the Numpy-discussion mailing list