[Numpy-discussion] np.loadtxt : yet a new implementation...
Tue Dec 2 13:47:38 CST 2008
I've tested the new loadtxt briefly. Looks good, except that there's a
minor bug when trying to use a specific white-space delimiter (e.g.
\t) while still allowing other white-space to be allowed in fields
Specifically, on line 115 in LineSplitter, we have:
self.delimiter = delimiter.strip() or None
so if I pass in, say, '\t' as the delimiter, self.delimiter gets set
to None, which then causes the default behavior of any-whitespace-is-
delimiter to be used. This makes lines like "Gene Name\tPubMed ID
\tStarting Position" get split wrong, even when I explicitly pass in
'\t' as the delimiter!
Similarly, I believe that some of the tests are formulated wrong:
"Test LineSplitter w/o delimiter"
strg = " 1 2 3 4 5 # test"
test = LineSplitter(' ')(strg)
assert_equal(test, ['1', '2', '3', '4', '5'])
I think that treating an explicitly-passed-in ' ' delimiter as
identical to 'no delimiter' is a bad idea. If I say that ' ' is the
delimiter, or '\t' is the delimiter, this should be treated *just*
like ',' being the delimiter, where the expected output is:
['1', '2', '3', '4', '', '5']
At least, that's what I would expect. Treating contiguous blocks of
whitespace as single delimiters is perfectly reasonable when None is
provided as the delimiter, but when an explicit delimiter has been
provided, it strikes me that the code shouldn't try to further-
Does anyone else have any opinion here?
On Dec 1, 2008, at 1:21 PM, Pierre GM wrote:
> Well, looks like the attachment is too big, so here's the
> implementation. The tests will come in another message.
> Numpy-discussion mailing list
More information about the Numpy-discussion