[Numpy-discussion] Fastest way to parsing a specific binay file
Wed Sep 2 11:52:35 CDT 2009
On Wed, Sep 2, 2009 at 10:11 AM, Robert Kern <firstname.lastname@example.org> wrote:
> On Wed, Sep 2, 2009 at 09:38, Gökhan Sever<email@example.com> wrote:
> > Hello,
> > I want to be able to parse a binary file which hold information regarding
> > experiment configuration and data obviously. Both configuration and data
> > sections are variable-length. A chuck this data is shown as below (after
> > binary read operation)
> > '\x00\x00@
> > Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
> > \n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
> > 'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@
> > In binary form the file is 1.3MB, and when written to a txt file it
> > to 3.7MB totalling approximately 4 million characters. When fully
> > (with an IDL code) it produces 86 seperate configuration files, and 46
> > files for data, about 10-15 different instruments and in various
> > combinations plus sampling rates.
> > I attemted to use RE module, however the time it takes parse the file is
> > really longer than I expected. What would be wisest and fastest way to
> > tackle this issue? Upon successful re-construction of the data and
> > I am planning to use a much modular structure like HDF5 or netCDF4 for an
> > easy data storage and analyses.
> Are there fixed delimiters? Like '\x00\x00@\x00' perhaps? It might be
> faster to search for those using .find() instead of regexes.
> Without more information about how the file format gets split up, I'm
> not sure we can make good suggestions.
> Robert Kern
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> NumPy-Discussion mailing list
Fixed delims... That is what I used to parse metadata with a regex.
r = re.compile("\0;.+?\0\0@\0\$", re.DOTALL) which extracts to portions that
I am interested. However I have yet to figure parsing separate data streams.
Couldn't find a way find to see which data blocks goes with which device.
I put the test binary file I am using at:
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion