[Numpy-discussion] Issues with the memmap object

Sturla Molden sturla@molden...
Mon Jun 18 10:30:39 CDT 2007


After struggling with NumPy's memmap object, I examined the code and 
detected three severe problems. I suggest that memmap is removed from 
NumPy, at least on Windows, as it's shortcomings is severe and 
undocumented.


Problem 1: I/O errors are never detected on Win32:

On Windows, i/o errors are trapped using structured exception handling 
when using memory mapped objects. Neither NumPy nor Python use 
structured exception handling on Win32. This means that i/o errors (such 
as network or disk failure) will go undetected, and be a source of 
obscure bugs.

The bugfix for this is to wrap any access attempt to an PyArrayObject's 
"data" pointer with __try and __except blocks, and using an MSVC 
compiler on Windows. GCC/MinGW cannot be used, as it does not support 
structured exception handling. In other words,

PyArrayObject *memmap;

__try {

    /* safe read/write access to memmap->data here */

}
__except( GetExceptionCode() == EXCEPTION_IN_PAGE_ERROR ?
    EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH) {

    /* Windows signaled an I/O error, handle the problem here */

}

Not only must NumPy itself be rewritten, but also any library getting a 
data pointer from a NumPy memmap array. Fixing this will be extremely 
difficult, if not impossible. The only safe way to access file data from 
NumPy is numpy.fromfile() and numpy.array.tofile().


Problem 2: Mapping always starts from the beginning of the file:

Python's standard mmap object from the beginning of the file, regardless 
of the size. NumPy's memmap object depends on Python's mmap through the 
buffer protocol. Even though NumPy's memmap object takes an offset 
parameter, the actual memory mapping starts from the beginning of the 
file. Thus, virtual memory equal to the memmap object's offset parameter 
will be leaked until the memmap object is deleted.


Problem 3: No 64 bit support on Windows or Linux:

On Linux, large files must be memory mapped using mmap64 (or mmap2 if 4k 
boundaries are acceptable). On Windows, CreateFileMapping/MapViewOfFile 
has 64 bit support, but Python's mmap does not use it (the high offset 
DWORD is always zero). Only files smaller than 4 GB can be memory mapped.



Regards,

Sturla Molden










More information about the Numpy-discussion mailing list