[Numpy-discussion] searching binary data
Wed Sep 22 09:40:24 CDT 2010
On Wed, Sep 22, 2010 at 11:25 PM, Neal Becker <firstname.lastname@example.org> wrote:
> David Cournapeau wrote:
>> On Wed, Sep 22, 2010 at 11:10 PM, Neal Becker <email@example.com> wrote:
>>> A colleague of mine posed the following problem. He wants to search
>>> large files of binary data for sequences.
>> Is there a reason why you cannot use one of the classic string search
>> algorithms applied to the bytestream ?
> What would you suggest? Keep in mind the file is to big to fit into memory
> all at once.
Do you care about speed ? String search and even regular expression
are supposed to work on mmap data, but I have never used them on large
datasets, so I don't know how they would perform.
Otherwise, depending on the data and whether you can afford
pre-computing, algorithms like Knuth Morris Pratt can speed things up.
But I would assume you would have to do it in C to hope any speed gain
compared to python string search .
More information about the NumPy-Discussion