[Numpy-discussion] loadtxt/savetxt tickets

Ralf Gommers ralf.gommers@googlemail....
Thu Mar 31 10:08:27 CDT 2011


On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southey <bsouthey@gmail.com> wrote:
> On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris
> <charlesr.harris@gmail.com> wrote:
>>
>>
>> On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes
>> <paul.anton.letnes@gmail.com> wrote:
>>>
>>> On 26. mars 2011, at 21.44, Derek Homeier wrote:
>>>
>>> > Hi Paul,
>>> >
>>> > having had a look at the other tickets you dug up,
>>> >
> [snip]
>>>
>>> >> 1071:
>>> >>      It is not clear to me whether loadtxt is supposed to support
>>> >> missing values in the fashion indicated in the ticket.
>>> >
>>> > In principle it should at least allow you to, by the use of converters
>>> > as described there.
>>> > The problem is, the default delimiter is described as 'any
>>> > whitespace', which in the
>>> > present implementation obviously includes any number of blanks or
>>> > tabs. These
>>> > are therefore treated differently from delimiters like ',' or '&'. I'd
>>> > reckon there are
>>> > too many people actually relying on this behaviour to silently change it
>>> > (e.g. I know plenty of tables with columns separated by either one or
>>> > several
>>> > tabs depending on the length of the previous entry). But the tab is
>>> > apparently also
>>> > treated differently if explicitly specified with "delimiter='\t'" -
>>> > and in that case using
>>> > a converter à la {2: lambda s: float(s or 'Nan')} is working for
>>> > fields in the middle of
>>> > the line, but not at the end - clearly warrants improvement. I've
>>> > prepared a patch
>>> > working for Python3 as well.
>>>
>>> Great!
>>>
> This is an invalid ticket because the docstring clearly states that in
> 3 different, yet critical places, that missing values are not handled
> here:
>
> "Each row in the text file must have the same number of values."
> "genfromtxt : Load data with missing values handled as specified."
>  "   This function aims to be a fast reader for simply formatted files.  The
>    `genfromtxt` function provides more sophisticated handling of, e.g.,
>    lines with missing values."
>
> Really I am trying to separate the usage of loadtxt and genfromtxt to
> avoid unnecessary duplication and confusion. Part of this is
> historical because loadtxt was added in 2007 and genfromtxt was added
> in 2009. So really certain features of loadtxt have been  'kept' for
> backwards compatibility purposes yet these features can be 'abused' to
> handle missing data. But I really consider that any missing values
> should cause loadtxt to fail.

I agree with you Bruce, but it would be easier to discuss this on the
tickets instead of here. Could you add your comments there please?

Ralf


> The patch is incorrect because it should not include a space in the
> split() as indicated in the comment by the original reporter. Of
> course a corrected patch alone still is not sufficient to address the
> problem without the user providing the correct converter. Also you
> start to run into problems with multiple delimiters (such as one space
> versus two spaces) so you start down the path to add all the features
> that duplicate genfromtxt.


More information about the NumPy-Discussion mailing list