[SciPy-User] Identify unique sequence data from array

Paul Anton Letnes paul.anton.letnes@gmail....
Wed Dec 22 15:47:30 CST 2010


On 22. des. 2010, at 21.18, otrov wrote:

>>> The problem:
> 
>>> I have 2D data sets (scipy/numpy arrays) of 10^7 to 10^8 rows, which consists of repeated sequences of one unique sequence, usually ~10^5 rows, but may differ in scale. Period is same for both columns, so there is not really difference if we consider 2D or 1D array.
>>> I want to track this data block.
> 
>> for i in range(1, len(X)-1):
>>    if (X[i:] == X[:-i]).all():
>>        break
> 
> Just look at that python beauty! Such a great language when in hand of a smart user.
> Thanks for you snippet, but unfortunately it takes forever to finish the task

You could also check one element at a time. I think it will be faster, because it will break if comparison of the first element doesn't hold. Then, if you find such an occurrence, use Robert's method to double check that you found the true repetition period.

Code:
>>> a = [1,2,3,4,1,2,3,4,1,2,3,4]
>>> a = numpy.array(a)
>>> for i in range(1, 1+a.size/2):
...     if (a[0] == a[::i]).all(): print 'period is ',i
...     
... 
period is  4


Cheers
Paul.



More information about the SciPy-User mailing list