# [SciPy-User] Identify unique sequence data from array

otrov dejan.org@gmail....
Wed Dec 22 11:47:19 CST 2010

```Hi,
I tried to seek for help on three other lists, but as this problem apparently can't be easily solved in matlab/octave(!?), I thought to try scipy/numpy and maybe gain advantage from python as more feature rich descriptive language

The problem:

I have 2D data sets (scipy/numpy arrays) of 10^7 to 10^8 rows, which consists of repeated sequences of one unique sequence, usually ~10^5 rows, but may differ in scale. Period is same for both columns, so there is not really difference if we consider 2D or 1D array.
I want to track this data block.

Simplified problem:

X = array([[1, 2],
[1, 2],
[2, 2],
[3, 1],
[2, 3],
[1, 2],
[1, 2],
[2, 2],
[3, 1],
[2, 3],
[1, 2],
[1, 2],
[2, 2],
[3, 1],
[2, 3],
...,
[1, 2],
[1, 2],
[2, 2],
[3, 1],
[2, 3]]

I would like to extract repeated sequence data:

Y = array([[1, 2],
[1, 2],
[2, 2],
[3, 1],
[2, 3]]

as a result.

Or presented more visually:

I want to identify unique sequence data:

A B C D D D A B C D D D A B C D D D
|_________| |_________| |_________|
|           |           |
unique      unique      unique
sequence    sequence    sequence
data        data        data

Thanks for your time

```