[Numpy-discussion] Memory mapped files in scipy core
jmiller at stsci.edu
Mon Nov 21 07:15:00 CST 2005
Travis Oliphant wrote:
> I would appreciate understanding typically use cases for memory-mapped
> files. I am not sure I understand why certain choices were made for
> numarray's memmap and memmap slice classes. They seem like a lot of
> "extra" stuff and I'm not sure what all that stuff is for.
> Rather than just copy these over, I would like to understand what
> people typically want to do with memory-mapped files to see if scipy
> core doesn't already provide that.
> For example, write now I can open a file, use mmap to obtain a memory
> map object and then pass that object into frombuffer in scipy_core to
> get an ndarray whose memory maps a file on disk.
> Now, this ndarray can be sliced and indexed and manipulated all the
> while referring to the file on disk (well technically, I suppose, the
> memory-mapped object would need to be flushed to synchronize).
> Now, I could see wanting to make the process of opening the file,
> getting the mmap object and setting it's buffer to the array object a
> little easier. Thus, a simple memmap class would be a useful
> construct -- I could even see it inheriting from the ndarray directly
> and adding a few methods. I guess I just don't see why one would
> care about a memory-mapped slice object, when the mmaparray sub-class
> would be perfectly useful.
There are a few extra capabilities which are supported in numarray's memmap:
1. slice insertion
2. slice deletion
3. memmap based array resizing
Each of these things implicitly changes the layout of the underlying
file. Whether or not these features get used or justify the complexity
is another matter. Because of 32-bit address space exhaustion and swap
file issues, memory mapping was a disappointment at STSCI so I'm
doubtful we used these features ourselves.
> On a related, but orthogonal note:
> My understanding is that using memory-mapped files for *very* large
> files will require modification to the mmap module in Python ---
> something I think we should push. One part of that process would be
> to add the C-struct array interface to the mmap module and the buffer
> object -- perhaps this is how we get the array interface into Python
> quickly. Then, if we could make a base-type mmap that did not use
> the buffer interface or the sequence interface (similar to the
> bigndarray in scipy_core) and therefore by-passed the problems with
> Python in those areas, then the current mmap object could inherit from
> the base class and provide current functionality while still exposing
> the array interface for access to >2GB files on 64-bit systems.
> Who would like to take up the ball for modifying mmap in Python in
> this fashion?
> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today
> Register for a JBoss Training Course. Free Certification Exam
> for All Training Attendees Through End of 2005. For more info visit:
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
More information about the Numpy-discussion