[Numpy-discussion] genloadtxt : last call

Pierre GM pgmdevlist@gmail....
Tue Dec 16 17:34:13 CST 2008


Ryan,
OK, I'll look into that. I won't have time to address it before this  
next week, however. Option #2 looks like the best.

In other news, I was considering renaming genloadtxt to genfromtxt,  
and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the  
function names. That way, loadtxt is untouched.



On Dec 16, 2008, at 6:07 PM, Ryan May wrote:

> Pierre GM wrote:
>> All,
>> Here's the latest version of genloadtxt, with some recent  
>> corrections.
>> With just a couple of tweaking, we end up with some decent speed:  
>> it's
>> still slower than np.loadtxt, but only 15% so according to the test  
>> at
>> the end of the package.
>
> I have one more use issue that you may or may not want to fix. My  
> problem is that
> missing "values" are specified by their string representation, so  
> that a string
> representing a missing value, while having the same actual numeric  
> value, may not
> compare equal when represented as a string.  For instance, if you  
> specify that
> -999.0 represents a missing value, but the value written to the file  
> is -999.00,
> you won't end up masking the -999.00 data point.  I'm sure a test  
> case will help
> here:
>
>     def test_withmissing_float(self):
>         data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
>         test = mloadtxt(data, dtype=None, delimiter=',',  
> missing='-999.0',
>                         names=True)
>         control = ma.array([(0, 1.5), (2, -1.)],
>                            mask=[(False, False), (False, True)],
>                            dtype=[('A', np.int), ('B', np.float)])
>         print control
>         print test
>         assert_equal(test, control)
>         assert_equal(test.mask, control.mask)
>
> Right now this fails with the latest version of genloadtxt.  I've  
> worked around
> this by specifying a whole bunch of string representations of the  
> values, but I
> wasn't sure if you knew of a better way that this could be handled  
> within
> genloadtxt.  I can only think of two ways, though I'm not thrilled  
> with either:
>
> 1) Call the converter on the string form of the missing value and  
> compare against
> the converted value from the file to determine if missing. (Probably  
> very slow)
>
> 2) Add a list of objects (ints, floats, etc.) to compare against  
> after conversion
> to determine if they're missing. This might needlessly complicate  
> the function,
> which I know you've already taken pains to optimize.
>
> If there's no good way to do it, I'm content to live with a  
> workaround.
>
> Ryan
>
> -- 
> Ryan May
> Graduate Research Assistant
> School of Meteorology
> University of Oklahoma
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion



More information about the Numpy-discussion mailing list