[Numpy-discussion] Efficient reading of binary data

Sebastian Haase haase@msg.ucsf....
Fri Apr 4 03:50:38 CDT 2008


On Fri, Apr 4, 2008 at 2:14 AM, Nicolas Bigaouette
<nbigaouette@gmail.com> wrote:
> 2008/4/3, Robert Kern <robert.kern@gmail.com>:
>
> > On Thu, Apr 3, 2008 at 6:53 PM, Nicolas Bigaouette
> > <nbigaouette@gmail.com> wrote:
> > > Thanx for the fast response Robert ;)
> > >
> > > I changed my code to use the slice:
> > >  E = data[6::9]It is indeed faster and less eat less memory. Great.
> > >
> > > Thanx for the endiannes! I knew there was something like this ;) I
> suspect
> > > that, in '>f8', "f" means float and "8" means 8 bytes?
> >
> >
> > Yes, and the '>' means big-endian. '<' is little-endian, and '=' is
> > native-endian.
>
> I just tested it with a big-endian machine, it does work indeed great :)
>
>
> > > From some benchmarks, I see that the slowest thing is disk access. It
> can
> > > slow the displaying of data from around 1sec (when data is in os cache
> or
> > > buffer) to 8sec.
> > >
> > > So the next step would be to only read the needed data from the binary
> > > file... Is it possible to read from a file with a slice? So instead of:
> > >
> > > data = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot)
> > > E = data[6::9]
> > > maybe something like:
> > > E = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot, slice=6::9)
> >
> >
> > Instead of reading using fromfile(), you can try memory-mapping the array.
> >
> >   from numpy import memmap
> >   E = memmap(f, dtype=float_dtype, mode='r')[6::9]
> >
> > That may or may not help. At least, it should decrease the latency
> > before you start pulling out frames.
> >
> >
> It did not worked out of the box (memmap() takes the filename and not a file
> handler) but anyway, its getting late.
>
Hi,
Accidentally I'm exactly trying to do the same thing right now .....

What is the best way of memmapping into a file that is already open !?

I have to read some text (header info) off the beginning of the file
before I know where the data actually starts.
I could of course get the position at that point ( f.tell() ) close
the file, and reopen using memmap.
However this doesn't sound optimal to me ....

Any hints ?
Could numpy's memmap be changed to also accept file-objects, or there
a "rule" that memmap always has to have access to the entire file ?


Thanks,
Sebastian Haase


More information about the Numpy-discussion mailing list