[Numpy-discussion] Efficient reading of binary data

Nicolas Bigaouette nbigaouette@gmail....
Thu Apr 3 15:30:40 CDT 2008


I have a C program which outputs large (~GB) files. It is a simple binary
dump of an array of structure containing 9 doubles. You can see this as a
double 1D array of size 9*Stot (Stot being the allocated size of the array
of structure). The 1D array represents a 3D array (Sx * Sy * Sz = Stot)
containing 9 values per cell.

I want to read these files in the most efficient way possible, and I would
like to have your insight on this.

Right now, the fastest way I found was:
imzeros = zeros((Sy,Sz),dtype=float64,order='C')
imex = imshow(imzeros)
f = open(filename, 'rb')
data = numpy.fromfile(file=f, dtype=numpy.float64, count=9*Stot)
mask_Ex = numpy.arange(6,9*Stot,9)
Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()

The arrays will be big, so everything should be well optimized. I have
multiple questions:

1) Should I change this:
Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
I mean, is I don't use a temporary variable, will it be faster or less
memory hungry?

2) If not, is the operation "Ex = " update the variable data or create
another one? Ideally I would like to only update it. Maybe this would be
Ex[:,:,:] = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
Would it?

3) The machine where the code will be run might be big-endian. Is there a
way for python to read the big-endian file and "translate" it automatically
to little-endian? Something like "numpy.fromfile(file=f,
dtype=numpy.float64, count=9*Stot, endianness='big')"?

Thanx a lot! ;)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080403/09d9a9dd/attachment.html 

More information about the Numpy-discussion mailing list