[SciPy-Dev] possible speed-up for arffread

Benjamin Root ben.root@ou....
Tue Jun 15 21:46:21 CDT 2010


Hello,

I was looking at the scipy.io.arff module to see if I could easily shave
some processing time for loading an ARFF file.  Doing some profiling on a
file with 40,000 floating point numbers pointed me to the safe_float()
function in the arffread.py file.  In it, it was stripping the string token
of any whitespace and then comparing it to '?' (which is ARFF's missing data
indicator).  I found that if one just does a check for the '?' character,
you can shave almost 30% of the processing time off of the safe_float()
function.

In addition, I found a very slight improvement by calculating the range(ni)
once and reusing that variable in the generator function.  Attached is my
patch file.

It isn't much, but it is noticeable.

Thanks,
Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100615/5a1c2d5c/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arffread_speedup.patch
Type: text/x-patch
Size: 1288 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-dev/attachments/20100615/5a1c2d5c/attachment.bin 


More information about the SciPy-Dev mailing list