[Numpy-discussion] Loading a > GB file into array
Ivan Vilata i Balaguer
ivilata@carabos....
Fri Nov 30 12:19:38 CST 2007
Martin Spacek (el 2007-11-30 a les 00:47:41 -0800) va dir::
>[...]
> I find that if I load the file in two pieces into two arrays, say 1GB
> and 0.3GB respectively, I can avoid the memory error. So it seems that
> it's not that windows can't allocate the memory, just that it can't
> allocate enough contiguous memory. I'm OK with this, but for indexing
> convenience, I'd like to be able to treat the two arrays as if they were
> one. Specifically, this file is movie data, and the array I'd like to
> get out of this is of shape (nframes, height, width).
>[...]
Well, one thing you could do is dump your data into a PyTables_
``CArray`` dataset, which you may afterwards access as if its was a
NumPy array to get slices which are actually NumPy arrays. PyTables
datasets have no problem in working with datasets exceeding memory size.
For instance::
h5f = tables.openFile('foo.h5', 'w')
carray = h5f.createCArray(
'/', 'bar', atom=tables.UInt8Atom(), shape=(TOTAL_NROWS, 3) )
base = 0
for array in your_list_of_partial_arrays:
carray[base:base+len(array)] = array
base += len(array)
carray.flush()
# Now you can access ``carray`` as a NumPy array.
carray[42] --> a (3,) uint8 NumPy array
carray[10:20] --> a (10, 3) uint8 NumPy array
carray[42,2] --> a NumPy uint8 scalar, "width" for row 42
(You may use an ``EArray`` dataset if you want to enlarge it with new
rows afterwards, or a ``Table`` if you want a different type for each
field.)
.. _PyTables: http://www.pytables.org/
HTH,
::
Ivan Vilata i Balaguer >qo< http://www.carabos.com/
Cárabos Coop. V. V V Enjoy Data
""
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: Digital signature
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20071130/993b8bdb/attachment.bin
More information about the Numpy-discussion
mailing list