[SciPy-User] String Matching in SciPy
Sat Oct 9 10:03:52 CDT 2010
Sat, 09 Oct 2010 15:21:15 +0200, Lorenzo Isella wrote:
> where L is the length I am looking for calculated for various choices of
> i. In the end of the day, I need a sort of built-in grep function for
> Python, but the first step is to understand if there is an efficient way
> to detect whether a certain substring (in the future of i) is a subset
> of the string giving the past of i.
> Any suggestion is welcome.
As far as I know, there's no builtin function in Numpy for doing this.
There are probably several choices how to proceed, among them:
Python's regexp module works also with buffers, so you can
directly use it on character arrays:
>>> import numpy as np
>>> import re
>>> x = np.array(list('asdasdasds'), dtype='S1')
array(['a', 's', 'd', 'a', 's', 'd', 'a', 's', 'd', 's'],
>>> re.search('sda', x[:4]).start()
This does not copy the data to a string, so it should be efficient.
If you need to find all occurrences, you can do
>>> matches = re.finditer('sda', x)
>>> offsets = [m.start() for m in matches]
If you have a large number of matches, this approach may become
less efficient, as it needs to form a Python match object for each
Write a simple function in Cython that does the string matching,
and returns an integer array of offsets.
More information about the SciPy-User