[Numpy-discussion] Efficient way to load a 1Gb file?
Russell E. Owen
Thu Aug 11 13:50:14 CDT 2011
Anne Archibald <email@example.com> wrote:
> There was also some work on a semi-mutable array type that allowed
> appending along one axis, then 'freezing' to yield a normal numpy
> array (unfortunately I'm not sure how to find it in the mailing list
> archives). One could write such a setup by hand, using mmap() or
> realloc(), but I'd be inclined to simply write a filter that converted
> the text file to some sort of binary file on the fly, value by value.
> Then the file can be loaded in or mmap()ed. A 1 Gb text file is a
> miserable object anyway, so it might be desirable to convert to (say)
> HDF5 and then throw away the text file.
Thank you and the others for your help.
It seems a shame that loadtxt has no argument for predicted length,
which would allow preallocation and less appending/copying data.
And yes...reading the whole file first to figure out how many elements
it has seems sensible to me -- at least as a switchable behavior, and
preferably the default. 1Gb isn't that large in modern systems, but
loadtxt is filing up all 6Gb of RAM reading it!
I'll suggest the HDF5 solution to my colleague. Meanwhile I think he's
hacked around the problem by reading the file through once to figure out
the array length, allocating that, and reading the data in with a Python
loop. Sounds slow, but it's working.
More information about the NumPy-Discussion