[Numpy-discussion] Fastest way to parsing a specific binay file
Wed Sep 2 13:42:06 CDT 2009
On Wed, Sep 2, 2009 at 12:46 PM, Robert Kern <firstname.lastname@example.org> wrote:
> On Wed, Sep 2, 2009 at 12:33, Gökhan Sever<email@example.com> wrote:
> > How your find suggestion work? It just returns the location of the first
> > occurrence.
> str.find(sub[, start[, end]])
> Return the lowest index in the string where substring sub is
> found, such that sub is contained in the range [start, end]. Optional
> arguments start and end are interpreted as in slice notation. Return
> -1 if sub is not found.
> But perhaps you should profile your code to see where it is actually
> taking up the time. Regexes on 1.3 MB of data should be quite fast.
> In : marker = '\x00\x00\@\x00$\x00\x02'
> In : block = marker + '\xde\xca\xfb\xad' * ((1024-8) // 4)
> In : data = int(round(1.3 * 1024)) * block
> In : import re
> In : r = re.compile(re.escape(marker))
> In : %time r.findall(data)
> CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
> Wall time: 0.01 s
> Robert Kern
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> NumPy-Discussion mailing list
This is what I have been using. It's not returning exactly what I want but
very close besides its being slow:
I: mypattern = re.compile('\0\0\1\0.+?\0\0@\0\$', re.DOTALL)
I: res = mypattern.findall(ss)
I: len res
I: %time mypattern.findall(ss);
CPU times: user 9.14 s, sys: 0.00 s, total: 9.14 s
Wall time: 9.16 s
*prj.300*\x00; Version = 1\nProjectName = PME1 2009 King Air
N825ST\nFlightId = \nAircraftType = WMI King Air 200\nAircraftId =
N825ST\nOperatorName = Weather Modification Inc.\nComments = \n\x00\x00@
I need the part starting with the bold typed section (prj.300) and till the
end of the section. I need the bold part because I can construct file names
from that and write the following content in it.
Ohh when it works the resulting search should return me 86 occurrence.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion