[Numpy-discussion] Loading a > GB file into array

Sebastian Haase haase@msg.ucsf....
Sat Dec 1 04:24:50 CST 2007


On Dec 1, 2007 12:09 AM, Martin Spacek <numpy@mspacek.mm.st> wrote:
> Kurt Smith wrote:
>  > You might try numpy.memmap -- others have had success with it for
>  > large files (32 bit should be able to handle a 1.3 GB file, AFAIK).
>
> Yeah, I looked into numpy.memmap. Two issues with that. I need to
> eliminate as much disk access as possible while my app is running. I'm
> displaying stimuli on a screen at 200Hz, so I have up to 5ms for each
> movie frame to load before it's too late and it drops a frame. I'm sort
> of faking a realtime OS on windows by setting the process priority
> really high. Disk access in the middle of that causes frames to drop. So
> I need to load the whole file into physical RAM, although it need not be
> contiguous. memmap doesn't do that, it loads on the fly as you index
> into the array, which drops frames, so that doesn't work for me.
>
> The 2nd problem I had with memmap was that I was getting a WindowsError
> related to memory:
>
>  >>> data = np.memmap(1.3GBfname, dtype=np.uint8, mode='r')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "C:\bin\Python25\Lib\site-packages\numpy\core\memmap.py", line
> 67, in __new__
>     mm = mmap.mmap(fid.fileno(), bytes, access=acc)
> WindowsError: [Error 8] Not enough storage is available to process this
> command
>
>
> This was for the same 1.3GB file. This is different from previous memory
> errors I mentioned. I don't get this on ubuntu. I can memmap a file up
> to 2GB on ubuntu no problem, but any larger than that and I get this:
>
>  >>> data = np.memmap(2.1GBfname, dtype=np.uint8, mode='r')
>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "/usr/lib/python2.5/site-packages/numpy/core/memmap.py", line
> 67, in __new__
>     mm = mmap.mmap(fid.fileno(), bytes, access=acc)
> OverflowError: cannot fit 'long' into an index-sized integer
>
> The OverflowError is on the bytes argument. If I try doing the mmap.mmap
> directly in Python, I get the same error. So I guess it's due to me
> running 32bit ubuntu.
>
Hi,
reading this thread I have two comments.
a) *Displaying* at 200Hz probably makes little sense, since humans
would only see about max. of 30Hz (aka video frame rate).
Consequently you would want to separate your data frame rate, that (as
I understand) you want to save data to disk and - asynchrounously -
"display as many frames as you can" (I have used pyOpenGL for this
with great satisfaction)
b) To my knowledge, any OS Linux, Windows an OSX can max. allocate
about 1GB of data - assuming you have a 32 bit machine.
The actual numbers I measured varied  from about 700MB to maybe 1.3GB.
In other words, you would be right at the limit.
(For 64bit, you would have to make sure ALL parts are 64bit, e.g. The
python version must be >=2.5, python must have been compiled using a
64-bit compiler *and* the windows version (XP-64))
This holds true the same for physical ram allocation and for memmap
allocation.
My solution to this was to "wait" for the 64bit .... not tested yet ;-)

Cheers,
Sebastian Haase


More information about the Numpy-discussion mailing list