[SciPy-User] unpacking binary data from a C structure

Anne Archibald peridot.faceted@gmail....
Tue Apr 13 10:55:53 CDT 2010


On 13 April 2010 11:27, Charles R Harris <charlesr.harris@gmail.com> wrote:
>
>
> On Tue, Apr 13, 2010 at 7:20 AM, Tom Kuiper <kuiper@jpl.nasa.gov> wrote:
>>
>> Dear list,
>>
>> here's something I find very strange.  I have a C structure defined as:
>>
>> typedef struct
>> {
>>  unsigned short spcid;         /* station id - 10, 40, 60, 21 */
>>  unsigned short vsrid;         /* vsr1a, vsr1b ... from enum */
>>  unsigned short chanid;        /* subchannel id 0,1,2,3 */
>>  unsigned short bps;           /* number of bits per sample - 1, 2, 4,
>> 8, or
>>                                   16 */
>>  unsigned long  srate;         /* number of samples per second in
>> kilo-samples
>>                                   per second */
>>  unsigned short error;         /* hw err flag, dma error or num_samples
>> error,
>>                                   0 ==> no errors */
>>  unsigned short year;          /* time tag - year */
>>  unsigned short doy;           /* time tag - day of year */
>>  unsigned long  sec;           /* time tag - second of day */
>>  double         freq;          /* in Hz */
>>  unsigned long  orate;         /* number of statistics samples per
>> second */
>>  unsigned short nsubchan;      /* number of output sub chans */
>> }
>> stats_hdr_t;
>>
>> The python module struct unpack expected format is 'HHHH L HHH L d L H'
>> Here's a real header structure as it appears at the head of a file:
>>
>>  0000000  000d  0001  0006  0008
>>  0000010  4240  000f  0000  0000
>>  0000020  0000  07da  0064  4730
>>  0000030  0001  0000  0000  0000
>>  0000040  d800  d31d  421d  03e8
>>  0000048  0000  0000  0000  0002
>>
>> Decoded as unsigned shorts:
>>
>>  0000000    13     1     6     8
>>  0000010 16960    15     0     0
>>  0000020     0  2010   100 18224
>>  0000030     1     0     0     0
>>  0000040 55296 54045 16925  1000
>>  0000050     0     0     0     2
>>
>> Matching these to the stats_hdr_t with 'unpack' notation:
>>
>>  0000000     H     H     H     H
>>  0000010    L1    L2     H     ?
>>  0000020     ?     H     H    L1
>>  0000030    L2     ?     ?    D1
>>  0000040    D2    D3    D4    L1
>>  0000050    L2     ?     ?     H
>>
>> So the actual format is 'HHHH L H xxxx HH L xxxx d L xxxx H'
>> What are all the mystery 4-byte blanks?  This works:
>>
>> buf = fd.read(50)
>> header = unpack_from('=4H LH2x 2x2HL4xdL4xH',buf)
>>
>> Since unpacking binary data must be a fairly common activity in
>> scientific circles. I hope you will have some suggestions.
>>
>
> I presume you didn't produce the data, but as a rule of thumb c structures
> should not be used to write out binary data, as the binary layout of the
> data won't be portable. Text, netcdf, hdf5, or some other standard data
> format is preferable, with text being perhaps the most portable. That said,
> lots of old data collection programs write out c structures, and no doubt
> newer programs do so also.

There's also a FORTRAN binary format which one program I have to cope
with uses; the exact layout of those data files depends on the
compiler (g77 vs. gfortran) as well as the hardware. I'd also add FITS
to the list of self-describing portable binary formats that python
supports well.


Anne

> Chuck
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>


More information about the SciPy-User mailing list