[Numpy-discussion] Huge arrays

Francesc Alted faltet@pytables....
Wed Sep 9 03:48:48 CDT 2009


A Wednesday 09 September 2009 07:22:33 David Cournapeau escrigué:
> On Wed, Sep 9, 2009 at 2:10 PM, Sebastian Haase<seb.haase@gmail.com> wrote:
> > Hi,
> > you can probably use PyTables for this. Even though it's meant to
> > save/load data to/from disk (in HDF5 format) as far as I understand,
> > it can be used to make your task solvable - even on a 32bit system !!
> > It's free (pytables.org) -- so maybe you can try it out and tell me if
> > I'm right ....
>
> You still would not be able to load a numpy array > 2 Gb. Numpy memory
> model needs one contiguously addressable chunk of memory for the data,
> which is limited under the 32 bits archs. This cannot be overcome in
> any way AFAIK.
>
> You may be able to save data > 2 Gb, by appending several chunks < 2
> Gb to disk - maybe pytables supports this if it has large file support
> (which enables to write files > 2Gb on a 32 bits system).

Yes, this later is supported in PyTables as long as the underlying filesystem 
supports files > 2 GB, which is very usual in modern operating systems.  This 
even works on 32-bit systems as the indexing machinery in Python has been 
completely replaced inside PyTables.

However, I think that what Daniel is trying to achieve is to be able to keep 
all the info in-memory because writing it to disk is too slow.  I also agree 
that your suggestion to use a 64-bit OS (or 32-bit Linux, as it can address 
the full 3GB right out-of-the-box, as Chuck said) is the way to go.

OTOH, having the possibility to manage compressed data buffers transparently 
in NumPy would help here, but not there yet ;-)

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list