[Numpy-discussion] Tiling / disk storage for matrix in numpy?

Francesc Altet faltet at carabos.com
Wed Apr 12 01:51:12 CDT 2006

A Divendres 07 Abril 2006 19:30, Webb Sprague va escriure:
> Hi all,
> Is there a way in numpy to associate a (large) matrix with a disk
> file, then and tile and index it, then cache it as you process the
> various pieces?  This is pretty important with massive image files,
> which can't fit into working memory, but in which (for example) you
> might be doing a convolution on a 100 x 100 pixel window on a small
> subset of the image.
> I know that caching algorithms are (1) complicated and (2) never
> general.  But there you go.
> Perhaps I can't find it, perhaps it would be a good project for the
> future?  If HDF or something does this already, could someone point me
> in the right direction?

In addition to using shared memory arrays, you may also want to
experiment with compressing images on-disk and read small chunks to
operate with them in-memory. This has the advantage that, if your
image is compressible enough (and most of them are quite a few), the
total size of the image in-file will be smaller, leaving more room to
the underlying OS filesystem cache to fit larger areas of the image.

Here you have a small PyTables program that exemplifies the concept:

import tables
import numpy

# Create a container for the image in file
f=tables.openFile('image.h5', 'w')
img=f.createEArray(f.root, 'img',
                   tables.Atom(shape=(1024,0), dtype='Int32', flavor='numpy'),
# Add 1024 rows to image
for i in xrange(1024):
# Get small chunks of the image in memory and operate with them
cs = 100
for i in xrange(0, 1024-2*cs, cs):
    # Get 100x100 squares
    chunk1 = img[i:i+cs, i:i+cs]
    chunk2 = img[i+cs:i+2*cs, i+cs:i+2*cs]
    chunk3 = chunk1*chunk2  # Trivial operation with them



>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data

More information about the Numpy-discussion mailing list