[Numpy-discussion] memory-efficient loadtxt
Paul Anton Letnes
Sun Sep 30 09:16:41 CDT 2012
For convenience and clarity, this is the diff in question:
And this is my numpy fork:
On Sun, Sep 30, 2012 at 4:14 PM, Paul Anton Letnes
> Hello everyone,
> I've modified loadtxt to make it (potentially) more memory efficient.
> The idea is that if a user passes a seekable file, (s)he can also pass
> the 'seekable=True' kwarg. Then, loadtxt will count the number of
> lines (containing data) and allocate an array of exactly the right
> size to hold the loaded data. The downside is that the line counting
> more than doubles the runtime, as it loops over the file twice, and
> there's a sort-of unnecessary np.array function call in the loop. The
> branch is called faster-loadtxt, which is silly due to the runtime
> doubling, but I'm hoping that the false advertising is acceptable :)
> (I naively expected a speedup by removing some needless list
> I'm pretty sure that the function can be micro-optimized quite a bit
> here and there, and in particular, the main for loop is a bit
> duplicated right now. However, I got the impression that someone was
> working on a More Advanced (TM) C-based file reader, which will
> replace loadtxt; this patch is intended as a useful thing to have
> while we're waiting for that to appear.
> The patch passes all tests in the test suite, and documentation for
> the kwarg has been added. I've modified all tests to include the
> seekable kwarg, but that was mostly to check that all tests are passed
> also with this kwarg. I guess it's bit too late for 1.7.0 though?
> Should I make a pull request? I'm happy to take any and all
> suggestions before I do.
More information about the NumPy-Discussion