[Numpy-discussion] Efficient reading of binary data
Fri Apr 4 03:50:38 CDT 2008
On Fri, Apr 4, 2008 at 2:14 AM, Nicolas Bigaouette
> 2008/4/3, Robert Kern <email@example.com>:
> > On Thu, Apr 3, 2008 at 6:53 PM, Nicolas Bigaouette
> > <firstname.lastname@example.org> wrote:
> > > Thanx for the fast response Robert ;)
> > >
> > > I changed my code to use the slice:
> > > E = data[6::9]It is indeed faster and less eat less memory. Great.
> > >
> > > Thanx for the endiannes! I knew there was something like this ;) I
> > > that, in '>f8', "f" means float and "8" means 8 bytes?
> > Yes, and the '>' means big-endian. '<' is little-endian, and '=' is
> > native-endian.
> I just tested it with a big-endian machine, it does work indeed great :)
> > > From some benchmarks, I see that the slowest thing is disk access. It
> > > slow the displaying of data from around 1sec (when data is in os cache
> > > buffer) to 8sec.
> > >
> > > So the next step would be to only read the needed data from the binary
> > > file... Is it possible to read from a file with a slice? So instead of:
> > >
> > > data = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot)
> > > E = data[6::9]
> > > maybe something like:
> > > E = numpy.fromfile(file=f, dtype=float_dtype, count=9*Stot, slice=6::9)
> > Instead of reading using fromfile(), you can try memory-mapping the array.
> > from numpy import memmap
> > E = memmap(f, dtype=float_dtype, mode='r')[6::9]
> > That may or may not help. At least, it should decrease the latency
> > before you start pulling out frames.
> It did not worked out of the box (memmap() takes the filename and not a file
> handler) but anyway, its getting late.
Accidentally I'm exactly trying to do the same thing right now .....
What is the best way of memmapping into a file that is already open !?
I have to read some text (header info) off the beginning of the file
before I know where the data actually starts.
I could of course get the position at that point ( f.tell() ) close
the file, and reopen using memmap.
However this doesn't sound optimal to me ....
Any hints ?
Could numpy's memmap be changed to also accept file-objects, or there
a "rule" that memmap always has to have access to the entire file ?
More information about the Numpy-discussion