[Numpy-discussion] loadtxt and missing values
Thu Mar 6 14:50:36 CST 2008
On Thu, Mar 6, 2008 at 12:36 PM, <firstname.lastname@example.org> wrote:
> Proposed solution:
> It's probably not the best way (noob, that's me), but this situation could be fixed by:
> 1) add a fill keyword to loadtxt such that
> def loadtxt(...,fill=-999):
> 2) add the following after the line "vals = line.split(delimiter)" (line 713 in core/numeric.py , numpy 1.0.4)
> for j in range(0,len(vals)):
> if vals[j] != '':
> Testing: -------------------------
> Load an 18,000 line ascii dataset, 22 float variables on each line, skipping the first column (its a time stamp).
> Timings using %timeit in ipython:
> Reading an ascii file with no missing values using the current version of loadtxt:
> ***10 loops, best of 3: 704 ms per loop
> Reading an ascii file with no missing values using the proposed changes to loadtxt:
> ***10 loops, best of 3: 802 ms per loop
> The changes do create a slight performance hit for those who use loadtxt to read in nicely behaving ascii data. If this is an issue, could a loadtxt2 function be added?
I haven't used loadtxt so I don't have an opinion on changing it. But
would this be faster instead of a for loop?
vals = [(z, fill)[z is ''] for z in vals]
More information about the Numpy-discussion