[Scipy-tickets] [SciPy] #1677: Support anonymous mmap in numpy.memmap()
SciPy Trac
scipy-tickets@scipy....
Thu Jun 14 11:12:43 CDT 2012
#1677: Support anonymous mmap in numpy.memmap()
--------------------------+-------------------------------------------------
Reporter: iandavis | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: 0.11.0
Component: scipy.sparse | Version: 0.10.0
Keywords: |
--------------------------+-------------------------------------------------
As far as I can tell, numpy.memmap() does not currently support anonymous
mode (MAP_ANONYMOUS for C's mmap), where there is no backing file.
However, I've found MAP_ANON useful for creating sparse arrays larger than
available memory, where most array entries are zero. It's fast and
efficient, and doesn't use any disk space (or cause any disk IO) unless
you outgrow RAM. For certain applications, it has major advantages over
scipy.sparse -- for instance, it can handle any shape and number of
dimensions efficiently, while scipy.sparse only does 2-D matrices.
Anyway, I've adapted numpy.memmap() into the sparse_zeros() class below.
I don't understand what all the support code was for originally, so it's
possible there are bugs / places to improve.
I'd propose this be added to either scipy.sparse, or directly to numpy
itself. I didn't see an obvious way to add this to numpy.memmap() because
it expects there to be a file name, but that would be a third possibility.
{{{
import numpy as np
class sparse_zeros(np.ndarray):
"""
Copied from numpy.core.memmap (v1.5.1)
Provides a zeros()-like array backed by an anonymous mmap().
Only pages with non-zero values require memory storage.
If enough pages are written to, however, you'll still get swapping.
"""
__array_priority__ = -100.0
def __new__(subtype, shape, dtype=np.uint8, order='C'):
# Import here to minimize 'import numpy' overhead
import mmap
descr = np.dtype(dtype)
if not isinstance(shape, tuple): shape = (shape,)
bytes = descr.itemsize
for k in shape:
bytes *= k
acc = mmap.ACCESS_COPY
mm = mmap.mmap(-1, bytes, access=acc)
self = np.ndarray.__new__(subtype, shape, dtype=descr, buffer=mm,
order=order)
self._mmap = mm
return self
def __array_finalize__(self, obj):
if hasattr(obj, '_mmap'):
self._mmap = obj._mmap
else:
self._mmap = None
def flush(self): pass
def sync(self):
"""This method is deprecated, use `flush`."""
warnings.warn("Use ``flush``.", DeprecationWarning)
self.flush()
def _close(self):
"""Close the memmap file. Only do this when deleting the
object."""
if self.base is self._mmap:
# The python mmap probably causes flush on close, but
# we put this here for safety
self._mmap.flush()
self._mmap.close()
self._mmap = None
def close(self):
"""Close the memmap file. Does nothing."""
warnings.warn("``close`` is deprecated on memmap arrays. Use
del", DeprecationWarning)
def __del__(self):
# We first check if we are the owner of the mmap, rather than
# a view, so deleting a view does not call _close
# on the parent mmap
if self._mmap is self.base:
try:
# First run tell() to see whether file is open
self._mmap.tell()
except ValueError:
pass
else:
self._close()
}}}
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1677>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list