[Scipy-tickets] [SciPy] #1677: Support anonymous mmap in numpy.memmap()

SciPy Trac scipy-tickets@scipy....
Thu Jun 14 11:12:43 CDT 2012


#1677: Support anonymous mmap in numpy.memmap()
--------------------------+-------------------------------------------------
 Reporter:  iandavis      |       Owner:  somebody
     Type:  enhancement   |      Status:  new     
 Priority:  normal        |   Milestone:  0.11.0  
Component:  scipy.sparse  |     Version:  0.10.0  
 Keywords:                |  
--------------------------+-------------------------------------------------
 As far as I can tell, numpy.memmap() does not currently support anonymous
 mode (MAP_ANONYMOUS for C's mmap), where there is no backing file.
 However, I've found MAP_ANON useful for creating sparse arrays larger than
 available memory, where most array entries are zero.  It's fast and
 efficient, and doesn't use any disk space (or cause any disk IO) unless
 you outgrow RAM.  For certain applications, it has major advantages over
 scipy.sparse -- for instance, it can handle any shape and number of
 dimensions efficiently, while scipy.sparse only does 2-D matrices.

 Anyway, I've adapted numpy.memmap() into the sparse_zeros() class below.
 I don't understand what all the support code was for originally, so it's
 possible there are bugs / places to improve.

 I'd propose this be added to either scipy.sparse, or directly to numpy
 itself.  I didn't see an obvious way to add this to numpy.memmap() because
 it expects there to be a file name, but that would be a third possibility.

 {{{
 import numpy as np
 class sparse_zeros(np.ndarray):
     """
     Copied from numpy.core.memmap (v1.5.1)
     Provides a zeros()-like array backed by an anonymous mmap().
     Only pages with non-zero values require memory storage.
     If enough pages are written to, however, you'll still get swapping.
     """
     __array_priority__ = -100.0
     def __new__(subtype, shape, dtype=np.uint8, order='C'):
         # Import here to minimize 'import numpy' overhead
         import mmap
         descr = np.dtype(dtype)
         if not isinstance(shape, tuple): shape = (shape,)
         bytes = descr.itemsize
         for k in shape:
             bytes *= k
         acc = mmap.ACCESS_COPY
         mm = mmap.mmap(-1, bytes, access=acc)
         self = np.ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
 order=order)
         self._mmap = mm
         return self
     def __array_finalize__(self, obj):
         if hasattr(obj, '_mmap'):
             self._mmap = obj._mmap
         else:
             self._mmap = None
     def flush(self): pass
     def sync(self):
         """This method is deprecated, use `flush`."""
         warnings.warn("Use ``flush``.", DeprecationWarning)
         self.flush()
     def _close(self):
         """Close the memmap file.  Only do this when deleting the
 object."""
         if self.base is self._mmap:
             # The python mmap probably causes flush on close, but
             # we put this here for safety
             self._mmap.flush()
             self._mmap.close()
             self._mmap = None
     def close(self):
         """Close the memmap file. Does nothing."""
         warnings.warn("``close`` is deprecated on memmap arrays.  Use
 del", DeprecationWarning)
     def __del__(self):
         # We first check if we are the owner of the mmap, rather than
         # a view, so deleting a view does not call _close
         # on the parent mmap
         if self._mmap is self.base:
             try:
                 # First run tell() to see whether file is open
                 self._mmap.tell()
             except ValueError:
                 pass
             else:
                 self._close()
 }}}

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1677>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list