[Numpy-discussion] loadtxt and missing values

caver_sean@o... caver_sean@o...
Thu Mar 6 14:36:53 CST 2008


I'm relatively new to numpy (and python in general), and so far I have been very pleased!  I've been writing an atmospheric boundary-layer observation analysis package to use for my PhD research and I have ran into an issue with the loadtxt function (as an aside, our dataloggers output ascii data files so I use loadtxt...eventually the data get converted to netCDF).

The issue:

Our SODAR (think radar, but sound waves instead of E&M) spits out a comma delimited string like:

   yyyy-mm-dd hh:mm:ss,val1,val2,val3,error_code,...,val48,val49\n

   If the SODAR detects an error, the string will be:

   yyyy-mm-dd hh:mm:ss,,,,error_code,...,,\n

As expected from the doc string (thus not a true 'bug'), loadtxt does not like missing values that are not marked by some 'missing value' (a series of ',,,,,,' does not fly!).

Proposed solution:

It's probably not the best way (noob, that's me), but this situation could be fixed by:

1) add a fill keyword to loadtxt such that

def loadtxt(...,fill=-999):

2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4)

       for j in range(0,len(vals)):
           if vals[j] != '':

Testing: -------------------------

Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp).

Timings using %timeit in ipython:

Reading an ascii file with no missing values using the current version of loadtxt:
***10 loops, best of 3: 704 ms per loop

Reading an ascii file with no missing values using the proposed changes to loadtxt:
***10 loops, best of 3: 802 ms per loop

The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data.  If this is an issue, could a loadtxt2 function be added?


Sean Arms
Ph.D. Student
School of Meteorology
University of Oklahoma

More information about the Numpy-discussion mailing list