[Numpy-discussion] Efficient removal of duplicates

Sturla Molden sturla@molden...
Tue Dec 16 06:24:52 CST 2008


There was an discussion about this on the c.l.p a while ago. Using a sort
will scale like O(n log n) or worse, whereas using a set (hash table) will
scale like amortized O(n). How to use a Python set to get a unique
collection of objects I'll leave to your imagination.

Sturla Molden

> On Mon, Dec 15, 2008 at 18:24, Daran Rife <drife@ucar.edu> wrote:
>> How about a solution inspired by recipe 18.1 in the Python Cookbook,
>> 2nd Ed:
>>
>> import numpy as np
>>
>> a = [(x0,y0), (x1,y1), ...]
>> l = a.tolist()
>> l.sort()
>> unique = [x for i, x in enumerate(l) if not i or x != b[l-1]]
>> a_unique = np.asarray(unique)
>>
>> Performance of this approach should be highly scalable.
>
> That basic idea is what unique1d() does; however, it uses numpy
> primitives to keep the heavy lifting in C instead of Python.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>   -- Umberto Eco
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>




More information about the Numpy-discussion mailing list