[Numpy-discussion] loading data

Mag Gam magawake@gmail....
Fri Jun 26 06:46:13 CDT 2009


Yes, you are correct!

I think this is the best path.

However, I need to learn how to append a hdf5 dataset . I looked at
this, http://code.google.com/p/h5py/wiki/FAQ#Appending_data_to_a_dataset
but was not able to do so. Do you happen to have any sample code for
this, if you used hdf5.




On Fri, Jun 26, 2009 at 7:31 AM, Francesc Alted<faltet@pytables.org> wrote:
> A Friday 26 June 2009 13:09:13 Mag Gam escrigué:
>> I really like the slice by slice idea!
>
> Hmm, after looking at the np.loadtxt() docstrings it seems it works by loading
> the complete file at once, so you shouldn't use this directly (unless you
> split your big file before, but this will take time too).  So, I'd say that
> your best bet would be to use Python's `csv.reader()` iterator to iterate over
> the lines in your file and setup a buffer (a NumPy array/recarray would be
> fine), so that when the buffer is full it is written to the HDF5 file.  That
> should be pretty optimal.
>
> With this you will not try to load the entire file into memory, which is what
> I think is probably killing the performance in your case (unless your machine
> has much more memory than 50 GB, that is).
>
> --
> Francesc Alted
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the Numpy-discussion mailing list