[Numpy-discussion] Fastest way to parsing a specific binay file

Gökhan Sever gokhansever@gmail....
Wed Sep 2 09:38:22 CDT 2009


Hello,

I want to be able to parse a binary file which hold information regarding to
experiment configuration and data obviously. Both configuration and data
sections are variable-length. A chuck this data is shown as below (after a
binary read operation)

'\x00\x00@\x00$\x00\x02\x00\x12\x00\xff\x00\x00\x00U\xaa\xfa\xffd\x00\x08\x00\x01\x00\x08\x00\xff\x00\x00\x00U\xaa\xfb\xffl\x00\xab\x00\x01\x00\xab\x00\xff\x00\x00\x00U\xaa\xe7\x03\x17\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00U\xaa\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00\xd9\x07\x04\x00\x02\x00\r\x00\x06\x00\x03\x00\x00\x00\x01\x00\x00\x00prj.300\x00;
Version = 1\n', 'ProjectName = PME1 2009 King Air N825ST\n', 'FlightId =
\n', 'AircraftType = WMI King Air 200\n', 'AircraftId = N825ST\n',
'OperatorName = Weather Modification Inc.\n', 'Comments = \n', '\x00\x00@

In binary form the file is 1.3MB, and when written to a txt file it expands
to 3.7MB totalling approximately 4 million characters. When fully processed
(with an IDL code) it produces 86 seperate configuration files, and 46 ascii
files for data, about 10-15 different instruments and in various
combinations plus sampling rates.

I attemted to use RE module, however the time it takes parse the file is
really longer than I expected. What would be wisest and fastest way to
tackle this issue? Upon successful re-construction of the data and metadata,
I am planning to use a much modular structure like HDF5 or netCDF4 for an
easy data storage and analyses.

Thank you.


-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090902/3da16e92/attachment.html 


More information about the NumPy-Discussion mailing list