[Numpy-discussion] loadtxt and missing values
Thu Mar 6 14:36:53 CST 2008
I'm relatively new to numpy (and python in general), and so far I have been very pleased! I've been writing an atmospheric boundary-layer observation analysis package to use for my PhD research and I have ran into an issue with the loadtxt function (as an aside, our dataloggers output ascii data files so I use loadtxt...eventually the data get converted to netCDF).
Our SODAR (think radar, but sound waves instead of E&M) spits out a comma delimited string like:
If the SODAR detects an error, the string will be:
As expected from the doc string (thus not a true 'bug'), loadtxt does not like missing values that are not marked by some 'missing value' (a series of ',,,,,,' does not fly!).
It's probably not the best way (noob, that's me), but this situation could be fixed by:
1) add a fill keyword to loadtxt such that
2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4)
for j in range(0,len(vals)):
if vals[j] != '':
Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp).
Timings using %timeit in ipython:
Reading an ascii file with no missing values using the current version of loadtxt:
***10 loops, best of 3: 704 ms per loop
Reading an ascii file with no missing values using the proposed changes to loadtxt:
***10 loops, best of 3: 802 ms per loop
The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added?
School of Meteorology
University of Oklahoma
More information about the Numpy-discussion