[Numpy-discussion] loading data
Fri Jun 26 06:46:13 CDT 2009
Yes, you are correct!
I think this is the best path.
However, I need to learn how to append a hdf5 dataset . I looked at
but was not able to do so. Do you happen to have any sample code for
this, if you used hdf5.
On Fri, Jun 26, 2009 at 7:31 AM, Francesc Alted<email@example.com> wrote:
> A Friday 26 June 2009 13:09:13 Mag Gam escrigué:
>> I really like the slice by slice idea!
> Hmm, after looking at the np.loadtxt() docstrings it seems it works by loading
> the complete file at once, so you shouldn't use this directly (unless you
> split your big file before, but this will take time too). So, I'd say that
> your best bet would be to use Python's `csv.reader()` iterator to iterate over
> the lines in your file and setup a buffer (a NumPy array/recarray would be
> fine), so that when the buffer is full it is written to the HDF5 file. That
> should be pretty optimal.
> With this you will not try to load the entire file into memory, which is what
> I think is probably killing the performance in your case (unless your machine
> has much more memory than 50 GB, that is).
> Francesc Alted
> Numpy-discussion mailing list
More information about the Numpy-discussion