[Numpy-discussion] genloadtxt : last call
Pierre GM
pgmdevlist@gmail....
Tue Dec 16 17:34:13 CST 2008
Ryan,
OK, I'll look into that. I won't have time to address it before this
next week, however. Option #2 looks like the best.
In other news, I was considering renaming genloadtxt to genfromtxt,
and using ndfromtxt, mafromtxt, recfromtxt, recfromcsv for the
function names. That way, loadtxt is untouched.
On Dec 16, 2008, at 6:07 PM, Ryan May wrote:
> Pierre GM wrote:
>> All,
>> Here's the latest version of genloadtxt, with some recent
>> corrections.
>> With just a couple of tweaking, we end up with some decent speed:
>> it's
>> still slower than np.loadtxt, but only 15% so according to the test
>> at
>> the end of the package.
>
> I have one more use issue that you may or may not want to fix. My
> problem is that
> missing "values" are specified by their string representation, so
> that a string
> representing a missing value, while having the same actual numeric
> value, may not
> compare equal when represented as a string. For instance, if you
> specify that
> -999.0 represents a missing value, but the value written to the file
> is -999.00,
> you won't end up masking the -999.00 data point. I'm sure a test
> case will help
> here:
>
> def test_withmissing_float(self):
> data = StringIO.StringIO('A,B\n0,1.5\n2,-999.00')
> test = mloadtxt(data, dtype=None, delimiter=',',
> missing='-999.0',
> names=True)
> control = ma.array([(0, 1.5), (2, -1.)],
> mask=[(False, False), (False, True)],
> dtype=[('A', np.int), ('B', np.float)])
> print control
> print test
> assert_equal(test, control)
> assert_equal(test.mask, control.mask)
>
> Right now this fails with the latest version of genloadtxt. I've
> worked around
> this by specifying a whole bunch of string representations of the
> values, but I
> wasn't sure if you knew of a better way that this could be handled
> within
> genloadtxt. I can only think of two ways, though I'm not thrilled
> with either:
>
> 1) Call the converter on the string form of the missing value and
> compare against
> the converted value from the file to determine if missing. (Probably
> very slow)
>
> 2) Add a list of objects (ints, floats, etc.) to compare against
> after conversion
> to determine if they're missing. This might needlessly complicate
> the function,
> which I know you've already taken pains to optimize.
>
> If there's no good way to do it, I'm content to live with a
> workaround.
>
> Ryan
>
> --
> Ryan May
> Graduate Research Assistant
> School of Meteorology
> University of Oklahoma
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
More information about the Numpy-discussion
mailing list