[Numpy-discussion] Fastest way to parsing a specific binay file

Gökhan Sever gokhansever@gmail....
Wed Sep 2 11:52:35 CDT 2009


On Wed, Sep 2, 2009 at 10:11 AM, Robert Kern <robert.kern@gmail.com> wrote:

> On Wed, Sep 2, 2009 at 09:38, Gökhan Sever<gokhansever@gmail.com> wrote:
> > Hello,
> >
> > I want to be able to parse a binary file which hold information regarding
> to
> > experiment configuration and data obviously. Both configuration and data
> > sections are variable-length. A chuck this data is shown as below (after
> a
> > binary read operation)
> >
> > '\x00\x00@
> \x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00U\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00U\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00U\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00;
> > Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
> > \n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
> > 'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@
> >
> > In binary form the file is 1.3MB, and when written to a txt file it
> expands
> > to 3.7MB totalling approximately 4 million characters. When fully
> processed
> > (with an IDL code) it produces 86 seperate configuration files, and 46
> ascii
> > files for data, about 10-15 different instruments and in various
> > combinations plus sampling rates.
> >
> > I attemted to use RE module, however the time it takes parse the file is
> > really longer than I expected. What would be wisest and fastest way to
> > tackle this issue? Upon successful re-construction of the data and
> metadata,
> > I am planning to use a much modular structure like HDF5 or netCDF4 for an
> > easy data storage and analyses.
>
> Are there fixed delimiters? Like '\x00\x00@\x00' perhaps? It might be
> faster to search for those using .find() instead of regexes.
>
> Without more information about how the file format gets split up, I'm
> not sure we can make good suggestions.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Fixed delims... That is what I used to parse metadata with a regex.

Something like:

r = re.compile("\0;.+?\0\0@\0\$", re.DOTALL) which extracts to portions that
I am interested. However I have yet to figure parsing separate data streams.
Couldn't find a way find to see which data blocks goes with which device.

I put the test binary file I am using at:

http://drop.io/1plh5rt


-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090902/b646e436/attachment-0001.html 


More information about the NumPy-Discussion mailing list