[Numpy-discussion] Huge arrays
Wed Sep 9 04:55:07 CDT 2009
A Wednesday 09 September 2009 10:48:48 Francesc Alted escrigué:
> OTOH, having the possibility to manage compressed data buffers
> transparently in NumPy would help here, but not there yet ;-)
Now that I think about it, in case the data is compressible, Daniel could try
to define a PyTables' compressed array or table on-disk and save chunks to it.
If data is compressible enough, the filesystem cache will keep it in-memory,
until the disk can eventually absorb it.
For doing this, I would recommend to use the LZO compressor, as it is one of
the fastest I've seen (at least until Blosc would be ready), because it can
compress up to 5 times faster than output data to disk (depending on how
compressible the data is, and the speed of the disk subsystem).
Of course, if data is not compressible at all, then this venue doesn't make a
lot of sense.
More information about the NumPy-Discussion