[SciPy-User] String Matching in SciPy
Lorenzo Isella
lorenzo.isella@gmail....
Sat Oct 9 08:21:15 CDT 2010
Dear All,
Please consider a text file like the one you can download from
http://dl.dropbox.com/u/5685598/time_series25_.dat
where every element is just the result of the application of an hash
function. That file stands for a time series whose length I will call n.
Consider position i along the file; the "past" of i is given by entries
[0:i] along the file, whereas the "future" of i (which includes i
itself) is given by the [i:n] positions.
My goal is to be able to find the length of the shortest substring in
the future of i which has not been already seen in its past.
Consider this example
series a b c c
i 0 1 2 3
which would lead to
i 0 1 2 3
L 1 1 1 2
where L is the length I am looking for calculated for various choices of i.
In the end of the day, I need a sort of built-in grep function for
Python, but the first step is to understand if there is an efficient way
to detect whether a certain substring (in the future of i) is a subset
of the string giving the past of i.
Any suggestion is welcome.
Cheers
Lorenzo
More information about the SciPy-User
mailing list