[Numpy-discussion] genloadtxt: second serving

Pierre GM pgmdevlist@gmail....
Thu Dec 4 05:51:53 CST 2008

Here's the second round of genloadtxt. That's a tad cleaner version  
than the previous one, where I tried to take  into account the  
different comments and suggestions that were posted. So, tabs should  
be supported and explicit whitespaces are not collapsed.
FYI, in the __main__ section, you'll find 2 hotshot tests and a timeit  
comparison: same input, no missing data, one with genloadtxt, one with  
np.loadtxt and a last one with matplotlib.mlab.csv2rec.

As you'll see, genloadtxt is roughly twice slower than np.loadtxt, but  
twice faster than csv2rec. One of the explanation for the slowness is  
indeed the use of classes for splitting lines and converting values.  
Instead of a basic function, we use the __call__ method of the class,  
which itself calls another function depending on the attribute values.  
I'd like to reduce this overhead, any suggestion is more than welcome,  
as usual.

Anyhow: as we do need speed, I suggest we put genloadtxt somewhere in  
numpy.ma, with an alias recfromcsv for John, using his defaults.  
Unless somebody comes with a brilliant optimization.

Let me know how it goes,

-------------- next part --------------
A non-text attachment was scrubbed...
Name: _preview.py
Type: text/x-python-script
Size: 31694 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20081204/f07f3c7a/attachment-0001.bin 
-------------- next part --------------

More information about the Numpy-discussion mailing list