[Numpy-discussion] genloadtxt : last call

Ryan May rmay31@gmail....
Tue Dec 16 17:07:14 CST 2008


Pierre GM wrote:
> All,
> Here's the latest version of genloadtxt, with some recent corrections. 
> With just a couple of tweaking, we end up with some decent speed: it's 
> still slower than np.loadtxt, but only 15% so according to the test at 
> the end of the package.

I have one more use issue that you may or may not want to fix. My problem is that 
missing "values" are specified by their string representation, so that a string 
representing a missing value, while having the same actual numeric value, may not 
compare equal when represented as a string.  For instance, if you specify that 
-999.0 represents a missing value, but the value written to the file is -999.00, 
you won't end up masking the -999.00 data point.  I'm sure a test case will help 
here:

     def test_withmissing_float(self):
         data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
         test = mloadtxt(data, dtype=None, delimiter=',', missing='-999.0',
                         names=True)
         control = ma.array([(0, 1.5), (2, -1.)],
                            mask=[(False, False), (False, True)],
                            dtype=[('A', np.int), ('B', np.float)])
         print control
         print test
         assert_equal(test, control)
         assert_equal(test.mask, control.mask)

Right now this fails with the latest version of genloadtxt.  I've worked around 
this by specifying a whole bunch of string representations of the values, but I 
wasn't sure if you knew of a better way that this could be handled within 
genloadtxt.  I can only think of two ways, though I'm not thrilled with either:

1) Call the converter on the string form of the missing value and compare against 
the converted value from the file to determine if missing. (Probably very slow)

2) Add a list of objects (ints, floats, etc.) to compare against after conversion 
to determine if they're missing. This might needlessly complicate the function, 
which I know you've already taken pains to optimize.

If there's no good way to do it, I'm content to live with a workaround.

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


More information about the Numpy-discussion mailing list