[Numpy-discussion] Huge arrays
Wed Sep 9 03:48:48 CDT 2009
A Wednesday 09 September 2009 07:22:33 David Cournapeau escrigué:
> On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase<firstname.lastname@example.org> wrote:
> > Hi,
> > you can probably use PyTables for this. Even though it's meant to
> > save/load data to/from disk (in HDF5 format) as far as I understand,
> > it can be used to make your task solvable - even on a 32bit system !!
> > It's free (pytables.org) -- so maybe you can try it out and tell me if
> > I'm right ....
> You still would not be able to load a numpy array > 2 Gb. Numpy memory
> model needs one contiguously addressable chunk of memory for the data,
> which is limited under the 32 bits archs. This cannot be overcome in
> any way AFAIK.
> You may be able to save data > 2 Gb, by appending several chunks < 2
> Gb to disk - maybe pytables supports this if it has large file support
> (which enables to write files > 2Gb on a 32 bits system).
Yes, this later is supported in PyTables as long as the underlying filesystem
supports files > 2 GB, which is very usual in modern operating systems. This
even works on 32-bit systems as the indexing machinery in Python has been
completely replaced inside PyTables.
However, I think that what Daniel is trying to achieve is to be able to keep
all the info in-memory because writing it to disk is too slow. I also agree
that your suggestion to use a 64-bit OS (or 32-bit Linux, as it can address
the full 3GB right out-of-the-box, as Chuck said) is the way to go.
OTOH, having the possibility to manage compressed data buffers transparently
in NumPy would help here, but not there yet ;-)
More information about the NumPy-Discussion