[Numpy-discussion] unique() should return a sorted array

Robert Kern robert.kern at gmail.com
Sun Jul 2 05:37:29 CDT 2006


Norbert Nemec wrote:
> I agree.
> 
> Currently the order of the output of unique is undefined. Defining it in
> such a way that it produces a sorted array will not break any compatibility.
> 
> My idea would be something like
> 
> def unique(arr,sort=True):
>     if hasattr(arr,'flatten'):
>         tmp = arr.flatten()
>         tmp.sort()
>         idx = concatenate([True],tmp[1:]!=tmp[:-1])
>         return tmp[idx]
>     else: # for compatibility:
>         set = {}
>         for item in inseq:
>             set[item] = None
>         if sort:
>             return asarray(sorted(set.keys()))
>        else:
>             return asarray(set.keys())
> 
> Does anybody know about the internals of the python "set"? How is
> .keys() implemented? I somehow have really doubts about the efficiency
> of this method.

Well, that's a dictionary, not a set, but they both use the same algorithm. They 
are both hash tables. If you need more specific details about how the hash 
tables are implemented, the source (Object/dictobject.c)is the best place for them.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco





More information about the Numpy-discussion mailing list