[Numpy-discussion] Loading a > GB file into array

Ivan Vilata i Balaguer ivilata@carabos....
Sat Dec 1 05:57:57 CST 2007

Ivan Vilata i Balaguer (el 2007-11-30 a les 19:19:38 +0100) va dir::

> Well, one thing you could do is dump your data into a PyTables_
> ``CArray`` dataset, which you may afterwards access as if its was a
> NumPy array to get slices which are actually NumPy arrays.  PyTables
> datasets have no problem in working with datasets exceeding memory size.

I've put together the simple script I've attached which dumps a binary
file into a PyTables' ``CArray`` or loads it to measure the time taken
to load each frame.  I've run it on my laptop, which has a not very fast
4200 RPM laptop hard disk, and I've reached average times of 16 ms per
frame, after dropping caches with::

    # sync && echo 1 > /proc/sys/vm/drop_caches

This I've done with the standard chunkshape and no compression.  Your
data may lean itself very well to bigger chunkshapes and compression,
which should lower access times even further.  Since (as David pointed
out) 200 Hz may be a little exaggerated for human eye, loading
individual frames from disk may prove more than enough for your problem.



	Ivan Vilata i Balaguer   >qo<   http://www.carabos.com/
	       Cárabos Coop. V.  V  V   Enjoy Data
-------------- next part --------------
A non-text attachment was scrubbed...
Name: frames.py
Type: text/x-python
Size: 2664 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20071201/3240371a/attachment.py 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 307 bytes
Desc: Digital signature
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20071201/3240371a/attachment.bin 

More information about the Numpy-discussion mailing list