[Numpy-discussion] loadtxt/savetxt tickets

Bruce Southey bsouthey@gmail....
Thu Mar 31 10:10:29 CDT 2011


On 03/31/2011 10:08 AM, Ralf Gommers wrote:
> On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southey<bsouthey@gmail.com>  wrote:
>> On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris
>> <charlesr.harris@gmail.com>  wrote:
>>>
>>> On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
>>> <paul.anton.letnes@gmail.com>  wrote:
>>>> On 26. mars 2011, at 21.44, Derek Homeier wrote:
>>>>
>>>>> Hi Paul,
>>>>>
>>>>> having had a look at the other tickets you dug up,
>>>>>
>> [snip]
>>>>>> 1071:
>>>>>>       It is not clear to me whether loadtxt is supposed to support
>>>>>> missing values in the fashion indicated in the ticket.
>>>>> In principle it should at least allow you to, by the use of converters
>>>>> as described there.
>>>>> The problem is, the default delimiter is described as 'any
>>>>> whitespace', which in the
>>>>> present implementation obviously includes any number of blanks or
>>>>> tabs. These
>>>>> are therefore treated differently from delimiters like ',' or '&'. I'd
>>>>> reckon there are
>>>>> too many people actually relying on this behaviour to silently change it
>>>>> (e.g. I know plenty of tables with columns separated by either one or
>>>>> several
>>>>> tabs depending on the length of the previous entry). But the tab is
>>>>> apparently also
>>>>> treated differently if explicitly specified with "delimiter='\t'" -
>>>>> and in that case using
>>>>> a converter à la {2: lambda s: float(s or 'Nan')} is working for
>>>>> fields in the middle of
>>>>> the line, but not at the end - clearly warrants improvement. I've
>>>>> prepared a patch
>>>>> working for Python3 as well.
>>>> Great!
>>>>
>> This is an invalid ticket because the docstring clearly states that in
>> 3 different, yet critical places, that missing values are not handled
>> here:
>>
>> "Each row in the text file must have the same number of values."
>> "genfromtxt : Load data with missing values handled as specified."
>>   "   This function aims to be a fast reader for simply formatted files.  The
>>     `genfromtxt` function provides more sophisticated handling of, e.g.,
>>     lines with missing values."
>>
>> Really I am trying to separate the usage of loadtxt and genfromtxt to
>> avoid unnecessary duplication and confusion. Part of this is
>> historical because loadtxt was added in 2007 and genfromtxt was added
>> in 2009. So really certain features of loadtxt have been  'kept' for
>> backwards compatibility purposes yet these features can be 'abused' to
>> handle missing data. But I really consider that any missing values
>> should cause loadtxt to fail.
> I agree with you Bruce, but it would be easier to discuss this on the
> tickets instead of here. Could you add your comments there please?
>
> Ralf
>
'Easier' seems a contradiction when you have use captcha...
Sure I will add more comments there.

Bruce


More information about the NumPy-Discussion mailing list