[Numpy-discussion] Loading a > GB file into array

Sebastian Haase haase@msg.ucsf....
Fri Dec 21 03:02:28 CST 2007

On Dec 21, 2007 12:11 AM, Martin Spacek <numpy@mspacek.mm.st> wrote:
> >> By the way, I installed 64-bit linux (ubuntu 7.10) on the same machine,
> >> and now numpy.memmap works like a charm. Slicing around a 15 GB file is fun!
> >>
> > Thanks for the feedback !
> > Did you get the kind of speed you need and/or the speed you were hoping for ?
> Nope. Like I wrote earlier, it seems there isn't time for disk access in
> my main loop, which is what memmap is all about. I resolved this by
> loading the whole file into memory as a python list of 2D arrays,
> instead of one huge contiguous 3D array. That got me an extra 100 to 200
> MB of physical memory to work with (about 1.4GB out of 2GB total) on
> win32, which is all I needed.

Instead of saying "memmap is ALL about disc access" I would rather
like to say that "memap is all about SMART disk access" -- what I mean
is that memmap should run as fast as a normal ndarray if it works on
the cached part of an array.  Maybe there is a way of telling memmap
when and what to cache  and when to sync that cache to the disk.
In other words, memmap should perform just like a in-pysical-memory
array  -- only that it once-in-a-while saves/load to/from the disk.
Or is this just wishful thinking ?
Is there a way of "pre loading" a given part into cache
(pysical-memory) or prevent disc writes at "bad times" ?
How about doing the sync from a different thread ;-)


More information about the Numpy-discussion mailing list