[Numpy-discussion] Finding unique rows in an array

Jouni K. Seppänen jks@iki...
Fri Aug 24 22:08:33 CDT 2007


Francesc Altet <faltet@carabos.com> writes:

> A Tuesday 21 August 2007, Mark.Miller escrigué:
>> Is there a good loopless way to identify all of the unique rows in an
>> array?  Something like numpy.unique() is ideal, but capable of
>> extracting unique subarrays along an axis.
>
> You can always do a view of the rows as strings and then use unique().

For large arrays it probably makes sense to hash the rows by taking a
dot product with a random vector. Then sort the hash values and identify
blocks of equal values (allowing for rounding errors). Rows with
different hash values are guaranteed to be different; for blocks of rows
with the same hash value, you'll have to check, but this will probably
be much less work than checking every row, and (I hope) BLAS makes the
dot-product phase go fast.

-- 
Jouni K. Seppänen
http://www.iki.fi/jks



More information about the Numpy-discussion mailing list