[SciPy-Dev] possible speed-up for arffread

Benjamin Root ben.root@ou....
Tue Jun 15 21:46:21 CDT 2010


I was looking at the scipy.io.arff module to see if I could easily shave
some processing time for loading an ARFF file.  Doing some profiling on a
file with 40,000 floating point numbers pointed me to the safe_float()
function in the arffread.py file.  In it, it was stripping the string token
of any whitespace and then comparing it to '?' (which is ARFF's missing data
indicator).  I found that if one just does a check for the '?' character,
you can shave almost 30% of the processing time off of the safe_float()

In addition, I found a very slight improvement by calculating the range(ni)
once and reusing that variable in the generator function.  Attached is my
patch file.

It isn't much, but it is noticeable.

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-dev/attachments/20100615/5a1c2d5c/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arffread_speedup.patch
Type: text/x-patch
Size: 1288 bytes
Desc: not available
Url : http://mail.scipy.org/pipermail/scipy-dev/attachments/20100615/5a1c2d5c/attachment.bin 

More information about the SciPy-Dev mailing list