[Numpy-discussion] [ANN] carray: an in-memory compressed data container
Sat Aug 21 18:29:41 CDT 2010
2010/8/21, Sebastian Haase <email@example.com>:
> Hi Francesc,
> another exciting project ... congratulations !
> Am I correct in thinking that memmapping a carray would also be a
> great speed advantage over memmapped ndarrays ? Let's say I have a
> 2Gbyte ndarray memmaped over a NFS network connection, should the
> speed increase simply scale with the compression factor ?
Mmh, in principle yes. However, carray is based on the concept of
independent chunks of data and frankly, it does not make a lot of
sense to me having to create many small memmapped files in order to
keep the chunks.
Instead, I'd use PyTables (what else? ;-) for this because it is also
based on the same chunk concept than carray, but chunks are saved on a
monolithic (HDF5) file, which is much easier to handle. These chunks
can be compressed with Blosc too, so I/O is fast (although due to the
HDF5 overhead, probably a compressed memmap approach might be faster
yet, but much more difficult to manage). And last but not least, this
does not have the limitation of virtual memory size of memmaped
solutions, which I find quite uncomfortable.
More information about the NumPy-Discussion