[Numpy-tickets] [NumPy] #717: numpy.loadtxt fails when missing values are present

NumPy numpy-tickets@scipy....
Thu Apr 3 21:25:35 CDT 2008


#717: numpy.loadtxt fails when missing values are present
--------------------------+-------------------------------------------------
 Reporter:  lesserwhirls  |       Owner:  somebody              
     Type:  enhancement   |      Status:  new                   
 Priority:  normal        |   Milestone:  1.0.5                 
Component:  numpy.core    |     Version:  none                  
 Severity:  normal        |    Keywords:  loadtxt missing values
--------------------------+-------------------------------------------------
 == Problem ==

 numpy.loadtxt fails when missing values are present.  For example, assume
 your data file is well behaved:
 {{{
 val1,val2,val3,val4,val5\n
 }}}
 loadtxt works great for this example (no surprise).  Now, if your data
 file is not 'well behaved' and contains missing values
 {{{
 val1,val2,,val4,val5\n
 }}}
 loadtxt fails.

 ----

 == Solution ==


 1) Add keyword fill to def

 {{{
 def loadtxt(...,fill=-999):
 }}}

 2) add the following after the line "vals = line.split(delimiter)"
    (line 713 in core/numeric.py , numpy 1.0.4):

 {{{
 vals = [(z, fill)[z is ''] for z in vals]
 }}}

 ----

 == Performace ==


 Load an 18,000 line ascii dataset, 22 float variables on each line,
 skipping the first column (its a time stamp).

 Timings using %timeit in ipython:

 Reading an ascii file with no missing values using the current version of
 loadtxt:[[BR]]

 ***10 loops, best of 3: 703 ms per loop

 Reading an ascii file with no missing values using the proposed changes to
 loadtxt:[[BR]]

 ***10 loops, best of 3: 801 ms per loop

 The changes do create a ''slight'' performance hit for those who use
 loadtxt to read in nicely behaving ascii data.  If this is an issue, could
 a loadtxt2 function be added?

-- 
Ticket URL: <http://scipy.org/scipy/numpy/ticket/717>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list