# [SciPy-User] Identify unique sequence data from array

Charles R Harris charlesr.harris@gmail....
Wed Dec 22 17:00:36 CST 2010

```On Wed, Dec 22, 2010 at 2:51 PM, Robert Kern <robert.kern@gmail.com> wrote:

> On Wed, Dec 22, 2010 at 16:47, Paul Anton Letnes
> <paul.anton.letnes@gmail.com> wrote:
> >
> > On 22. des. 2010, at 21.18, otrov wrote:
> >
> >>>> The problem:
> >>
> >>>> I have 2D data sets (scipy/numpy arrays) of 10^7 to 10^8 rows, which
> consists of repeated sequences of one unique sequence, usually ~10^5 rows,
> but may differ in scale. Period is same for both columns, so there is not
> really difference if we consider 2D or 1D array.
> >>>> I want to track this data block.
> >>
> >>> for i in range(1, len(X)-1):
> >>>    if (X[i:] == X[:-i]).all():
> >>>        break
> >>
> >> Just look at that python beauty! Such a great language when in hand of a
> smart user.
> >> Thanks for you snippet, but unfortunately it takes forever to finish the
> >
> > You could also check one element at a time. I think it will be faster,
> because it will break if comparison of the first element doesn't hold. Then,
> if you find such an occurrence, use Robert's method to double check that you
> found the true repetition period.
>
> Excellent point.
>
>
Why not do an FFT and look at the shape around the carrier frequency? The DC
level should probably be subtracted first. It shoud also be possible to
construct a Weiner filter to extract the sequences if they don't occur with
strict periods.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20101222/589ed058/attachment-0001.html
```