[SciPy-User] String Matching in SciPy

Lorenzo Isella lorenzo.isella@gmail....
Sat Oct 9 08:21:15 CDT 2010


Dear All,
Please consider a text file like the one you can download from

http://dl.dropbox.com/u/5685598/time_series25_.dat

where every element is just the result of the application of an hash 
function. That file stands for a time series whose length I will call n. 
Consider position i along the file; the "past" of i is given by entries 
[0:i] along the file, whereas the "future" of i (which includes i 
itself) is given by the [i:n] positions.
My goal is to be able to find the length of the shortest substring in 
the future of i which has not been already seen in its past.
Consider this example

series  a b c c
    i    0 1 2 3

which would lead to

i 0 1 2 3
L 1 1 1 2
where L is the length I am looking for calculated for various choices of i.
In the end of the day, I need a sort of built-in grep function for 
Python, but the first step is to understand if there is an efficient way 
to detect whether a certain substring (in the future of i) is a subset 
of the string giving the past of i.
Any suggestion is welcome.
Cheers

Lorenzo


More information about the SciPy-User mailing list