# [Numpy-discussion] unique 2d arrays

Tue Sep 21 12:29:03 CDT 2010

On Tue, Sep 21, 2010 at 1:55 AM, Peter Schmidtke
> Dear all,
>
> I'd like to know if there is a pythonic / numpy way of retrieving unique
> lines of a 2d numpy array.
>
> In a way I have this :
>
> [[409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [409 152]
>  [426 193]
>  [431 129]]
>
> And I'd like to get this :
>
> [[409 152]
>  [426 193]
>  [431 129]]
>
>
> How can I do this without workarounds like string concatenation or such
> things? Numpy.unique flattens the whole array so it's not really of use
> here.
>

Here is one alternative:

I[15]: a = np.array([[409, 152], [409, 152], [426, 193], [431, 129]])

I[16]: np.array(list(set(tuple(i) for i in a.tolist())))
O[16]:
array([[409, 152],
[426, 193],
[431, 129]])

I[6]: %timeit
np.unique(a.view([('',a.dtype)]*a.shape[1])).view(a.dtype).reshape(-1,a.shape[1])
10000 loops, best of 3: 51 us per loop

I[8]: %timeit np.array(list(set(tuple(i) for i in a.tolist())))
10000 loops, best of 3: 31.4 us per loop

# Try with a bigger array
I[9]: k = np.array((a.tolist()*50000))

I[10]: %timeit np.array(list(set(tuple(i) for i in k.tolist())))
1 loops, best of 3: 324 ms per loop

I[11]: %timeit
np.unique(k.view([('',k.dtype)]*k.shape[1])).view(k.dtype).reshape(-1,k.shape[1])
1 loops, best of 3: 790 ms per loop

Seems like faster on these tests comparing to the unique method. Also it is
