[SciPy-user] efficiently importing ascii data

Darren Dale dd55 at cornell.edu
Thu Nov 10 13:36:57 CST 2005


I'm reading arrays of data from an ascii file and converting to appropriate 
numerical types. The data files can get pretty big, I was wondering if 
someone here might have a suggestion on how to speed things up. The following 
illustrates two bottlenecks: the list comprehension step and the conversion 
of the resulting list to an array:

from time import clock
from scipy import array
# simulate some data:
s='1e-7,'
ascii_data=(s*1000000)[:-1]
# convert it to an array:
t0=clock()
temp=[float(i) for i in ascii_data.split(',')]
dt=clock()-t0
print dt
data=array(temp)
print clock()-(t0+dt)

On my system, the lc takes 1.8s, while creating the array takes 2.9s. Could 
anyone suggest how I might speed things up? I considered using map(), it is 
about 25% faster than the list comprehension, but I've read that map will go 
away in python3000. 

Thanks,
Darren

P.S. I should mention that my data files are somewhat complex, so I cant use 
python's csv module, or scipy's load. Here's a small example to show the 
complexity of the formatting:

#S 1 growthtime MCS 0.01 3000 MCA 0 5000 
#L      Seconds     monitor      Bicron          I0          I1          I3 
temperature    pressure     vfc_mon          Ti          Mn Epoch 
#@MCA %16C
30 4.56151e+06 44184 5.15098e+06 6.97912e+06 34466 22737.3 1.68483e+07 29984 
2529 882  1079258492.6 
#C MCS pass 0 
@AMCS 18 13 20 18 19 20 16 13 15 14 13 16 15 13 20 7\
 8 14 14 19 12 7 17 16 13 23 21 12 17 13 11 12\
 19 15 17 13 12 14 15 21 11 12 16 11 17 13 18 20\
[This continues on at great length...]
#C MCA data
@AMCA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
[so does this]
30 4.56151e+06 44184 5.15098e+06 6.97912e+06 34466 22737.3 1.68483e+07 29984 
2529 882  1079258492.6 
#C MCS pass 1 
@AMCS 18 13 20 18 19 20 16 13 15 14 13 16 15 13 20 7\
 8 14 14 19 12 7 17 16 13 23 21 12 17 13 11 12\
 19 15 17 13 12 14 15 21 11 12 16 11 17 13 18 20\
[...]
#C MCA data
@AMCA 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\
[...]
[etc.]



More information about the SciPy-user mailing list