[Numpy-discussion] np.loadtxt : yet a new implementation...
Mon Dec 1 16:55:43 CST 2008
I agree, genloadtxt is a bit blotted, and it's not a surprise it's
slower than the initial one. I think that in order to be fair,
comparisons must be performed with matplotlib.mlab.csv2rec, that
implements as well the autodetection of the dtype. I'm quite in favor
of keeping a lite version around.
On Dec 1, 2008, at 4:47 PM, Stéfan van der Walt wrote:
> I haven't investigated the code in too much detail, but wouldn't it be
> possible to implement the current set of functionality in a
> base-class, which is then specialised to add the rest? That way, one
> could always instantiate TextReader yourself for some added speed.
Well, one of the issues is that we need to keep the function
compatible w/ urllib.urlretrieve (Ryan, am I right?), which means not
being able to go back to the beginning of a file (no call to .seek).
Another issue comes from the possibility to define the dtype
automatically: you need to keep track of the converters, then have to
do a second loop on the data. Those converters are likely the
bottleneck, as you need to check whether each value can be interpreted
as missing or not and respond appropriately.
I thought about creating a base class, with a specific subclass taking
care of the missing values. I found out it would have duplicated a lot
In any case, I think that's secondary: we can always optimize pieces
of the code afterwards. I'd like more feedback on corner cases and
More information about the Numpy-discussion