[Numpy-discussion] Tiling / disk storage for matrix in numpy?
faltet at carabos.com
Wed Apr 12 01:51:12 CDT 2006
A Divendres 07 Abril 2006 19:30, Webb Sprague va escriure:
> Hi all,
> Is there a way in numpy to associate a (large) matrix with a disk
> file, then and tile and index it, then cache it as you process the
> various pieces? This is pretty important with massive image files,
> which can't fit into working memory, but in which (for example) you
> might be doing a convolution on a 100 x 100 pixel window on a small
> subset of the image.
> I know that caching algorithms are (1) complicated and (2) never
> general. But there you go.
> Perhaps I can't find it, perhaps it would be a good project for the
> future? If HDF or something does this already, could someone point me
> in the right direction?
In addition to using shared memory arrays, you may also want to
experiment with compressing images on-disk and read small chunks to
operate with them in-memory. This has the advantage that, if your
image is compressible enough (and most of them are quite a few), the
total size of the image in-file will be smaller, leaving more room to
the underlying OS filesystem cache to fit larger areas of the image.
Here you have a small PyTables program that exemplifies the concept:
# Create a container for the image in file
tables.Atom(shape=(1024,0), dtype='Int32', flavor='numpy'),
# Add 1024 rows to image
for i in xrange(1024):
# Get small chunks of the image in memory and operate with them
cs = 100
for i in xrange(0, 1024-2*cs, cs):
# Get 100x100 squares
chunk1 = img[i:i+cs, i:i+cs]
chunk2 = img[i+cs:i+2*cs, i+cs:i+2*cs]
chunk3 = chunk1*chunk2 # Trivial operation with them
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
More information about the Numpy-discussion