[Numpy-discussion] Memory mapped files in scipy core
oliphant at ee.byu.edu
Sun Nov 20 01:04:01 CST 2005
I would appreciate understanding typically use cases for memory-mapped
files. I am not sure I understand why certain choices were made for
numarray's memmap and memmap slice classes. They seem like a lot of
"extra" stuff and I'm not sure what all that stuff is for.
Rather than just copy these over, I would like to understand what people
typically want to do with memory-mapped files to see if scipy core
doesn't already provide that.
For example, write now I can open a file, use mmap to obtain a memory
map object and then pass that object into frombuffer in scipy_core to
get an ndarray whose memory maps a file on disk.
Now, this ndarray can be sliced and indexed and manipulated all the
while referring to the file on disk (well technically, I suppose, the
memory-mapped object would need to be flushed to synchronize).
Now, I could see wanting to make the process of opening the file,
getting the mmap object and setting it's buffer to the array object a
little easier. Thus, a simple memmap class would be a useful construct
-- I could even see it inheriting from the ndarray directly and adding a
few methods. I guess I just don't see why one would care about a
memory-mapped slice object, when the mmaparray sub-class would be
On a related, but orthogonal note:
My understanding is that using memory-mapped files for *very* large
files will require modification to the mmap module in Python ---
something I think we should push. One part of that process would be to
add the C-struct array interface to the mmap module and the buffer
object -- perhaps this is how we get the array interface into Python
quickly. Then, if we could make a base-type mmap that did not use the
buffer interface or the sequence interface (similar to the bigndarray in
scipy_core) and therefore by-passed the problems with Python in those
areas, then the current mmap object could inherit from the base class
and provide current functionality while still exposing the array
interface for access to >2GB files on 64-bit systems.
Who would like to take up the ball for modifying mmap in Python in this
More information about the Numpy-discussion