[Numpy-discussion] Loading a > GB file into array

Ivan Vilata i Balaguer ivilata@carabos....
Fri Nov 30 12:19:38 CST 2007


Martin Spacek (el 2007-11-30 a les 00:47:41 -0800) va dir::

>[...]
> I find that if I load the file in two pieces into two arrays, say 1GB
> and 0.3GB respectively, I can avoid the memory error. So it seems that
> it's not that windows can't allocate the memory, just that it can't
> allocate enough contiguous memory. I'm OK with this, but for indexing
> convenience, I'd like to be able to treat the two arrays as if they were
> one. Specifically, this file is movie data, and the array I'd like to
> get out of this is of shape (nframes, height, width).
>[...]

Well, one thing you could do is dump your data into a PyTables_
``CArray`` dataset, which you may afterwards access as if its was a
NumPy array to get slices which are actually NumPy arrays.  PyTables
datasets have no problem in working with datasets exceeding memory size.
For instance::

  h5f = tables.openFile('foo.h5', 'w')
  carray = h5f.createCArray(
      '/', 'bar', atom=tables.UInt8Atom(), shape=(TOTAL_NROWS, 3) )
  base = 0
  for array in your_list_of_partial_arrays:
      carray[base:base+len(array)] = array
      base += len(array)
  carray.flush()

  # Now you can access ``carray`` as a NumPy array.
  carray[42] --> a (3,) uint8 NumPy array
  carray[10:20] --> a (10, 3) uint8 NumPy array
  carray[42,2] --> a NumPy uint8 scalar, "width" for row 42

(You may use an ``EArray`` dataset if you want to enlarge it with new
rows afterwards, or a ``Table`` if you want a different type for each
field.)

.. _PyTables: http://www.pytables.org/

HTH,

::

	Ivan Vilata i Balaguer   >qo<   http://www.carabos.com/
	       Cárabos Coop. V.  V  V   Enjoy Data
	                          ""
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: Digital signature
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20071130/993b8bdb/attachment.bin 


More information about the Numpy-discussion mailing list