[Numpy-discussion] unique rows of array

Maria Liukis liukis@usc....
Mon Aug 17 23:30:46 CDT 2009


Hello everybody,

While re-implementing some Matlab code in Python, I've run into a  
problem of finding a NumPy function analogous to the Matlab's "unique 
(array, 'rows')" to get unique rows of an array. Searching the web,  
I've found a similar discussion from couple of years ago with an  
example:


############## A SNIPPET FROM THE DISCUSSION
[Numpy-discussion] Finding unique rows in an array [Was: Finding a  
row match within a numpy array]
A Tuesday 21 August 2007, Mark.Miller escrigué:
 > A slightly related question on this topic...
 >
 > Is there a good loopless way to identify all of the unique rows in an
 > array?  Something like numpy.unique() is ideal, but capable of
 > extracting unique subarrays along an axis.

You can always do a view of the rows as strings and then use unique().
Here is an example:

In [1]: import numpy
In [2]: a=numpy.arange(12).reshape(4,3)
In [3]: a[2]=(3,4,5)
In [4]: a
Out[4]:
array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 3,  4,  5],
        [ 9, 10, 11]])

now, create the view and select the unique rows:

In [5]: b=numpy.unique(a.view('S%d'%a.itemsize*a.shape[0])).view('i4')

and finally restore the shape:

In [6]: b.reshape((len(b)/a.shape[1], a.shape[1]))
Out[6]:
array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 9, 10, 11]])

If you want to find unique columns instead of rows, do a tranpose first
on the initial array.

################END OF DISCUSSION


Provided example works only because array elements are row-sorted.  
Changing tested array to (in my case, it's 'c'):

 >>> c
array([[ 0,  1,  2],
        [ 3,  4,  5],
        [ 3,  4,  5],
        [ 9, 10, 11]])
 >>> c[0] = (11, 10, 0)
 >>> c
array([[11, 10,  0],
        [ 3,  4,  5],
        [ 3,  4,  5],
        [ 9, 10, 11]])
 >>> b = np.unique(c.view('S%s' %c.itemsize*c.shape[0]))
 >>> b
array(['', '\x03', '\x04', '\x05', '\t', '\n', '\x0b'],
       dtype='|S4')
 >>> b.view('i4')
array([ 0,  3,  4,  5,  9, 10, 11])
 >>> b.reshape((len(b)/c.shape[1], c.shape[1])).view('i4')
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
ValueError: total size of new array must be unchanged
 >>>

Since len(b) = 7.

Suggested approach would work if the whole row would be converted to  
a single string, I guess. But from what I could gather,  
numpy.array.view() only changes display element-wise.

Before I start re-inventing the wheel, I was just wondering if using  
existing numpy functionality one could find unique rows in an array.


Many thanks in advance!
Masha
--------------------
liukis@usc.edu



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090817/d0479c87/attachment.html 


More information about the NumPy-Discussion mailing list