[Numpy-discussion] Efficient reading of binary data

Robert Kern robert.kern@gmail....
Thu Apr 3 15:47:28 CDT 2008


On Thu, Apr 3, 2008 at 3:30 PM, Nicolas Bigaouette
<nbigaouette@gmail.com> wrote:
> Hi,
>
> I have a C program which outputs large (~GB) files. It is a simple binary
> dump of an array of structure containing 9 doubles. You can see this as a
> double 1D array of size 9*Stot (Stot being the allocated size of the array
> of structure). The 1D array represents a 3D array (Sx * Sy * Sz = Stot)
> containing 9 values per cell.
>
> I want to read these files in the most efficient way possible, and I would
> like to have your insight on this.
>
> Right now, the fastest way I found was:
> imzeros = zeros((Sy,Sz),dtype=float64,order='C')
>  imex = imshow(imzeros)
> f = open(filename, 'rb')
> data = numpy.fromfile(file=f, dtype=numpy.float64, count=9*Stot)
> mask_Ex = numpy.arange(6,9*Stot,9)

This is something you can do much, much more efficiently by using a
slice instead of indexing with an integer array.

> Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
>  imex.set_array(squeeze(Ex3D[:,:,z]))
>
> The arrays will be big, so everything should be well optimized. I have
> multiple questions:
>
> 1) Should I change this:
> Ex = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()
>  imex.set_array(squeeze(Ex3D[:,:,z]))
> to:
>  imex.set_array(squeeze(data[mask].reshape((Sz,Sy,Sx),
> order='C').transpose()[:,:,z]))
> I mean, is I don't use a temporary variable, will it be faster or less
> memory hungry?

No. The temporary exists whether you give it a name or not. If you use
data[6::9] instead of data[mask], you won't be using any extra memory
at all. The arrays will just be views into the original array.

> 2) If not, is the operation "Ex = " update the variable data or create
> another one?

It just reassigns the name "Ex" to a different object specified on the
right-hand side of the assignment. The relevant question is whether
expression on the right-hand side takes up more memory.

> Ideally I would like to only update it. Maybe this would be
> better:
>
> Ex[:,:,:] = data[mask].reshape((Sz,Sy,Sx), order='C').transpose()Would it?

If you use data[6::9] instead of data[mask], you should just use "Ex =
" since no new memory will be used on the RHS.

> 3) The machine where the code will be run might be big-endian. Is there a
> way for python to read the big-endian file and "translate" it automatically
> to little-endian? Something like "numpy.fromfile(file=f,
> dtype=numpy.float64, count=9*Stot, endianness='big')"?

dtype=numpy.dtype('>f8')

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco


More information about the Numpy-discussion mailing list