[Numpy-discussion] np.loadtxt : yet a new implementation...
Tue Dec 2 13:56:26 CST 2008
Zachary Pincus wrote:
> Specifically, on line 115 in LineSplitter, we have:
> self.delimiter = delimiter.strip() or None
> so if I pass in, say, '\t' as the delimiter, self.delimiter gets set
> to None, which then causes the default behavior of any-whitespace-is-
> delimiter to be used. This makes lines like "Gene Name\tPubMed ID
> \tStarting Position" get split wrong, even when I explicitly pass in
> '\t' as the delimiter!
> Similarly, I believe that some of the tests are formulated wrong:
> def test_nodelimiter(self):
> "Test LineSplitter w/o delimiter"
> strg = " 1 2 3 4 5 # test"
> test = LineSplitter(' ')(strg)
> assert_equal(test, ['1', '2', '3', '4', '5'])
> I think that treating an explicitly-passed-in ' ' delimiter as
> identical to 'no delimiter' is a bad idea. If I say that ' ' is the
> delimiter, or '\t' is the delimiter, this should be treated *just*
> like ',' being the delimiter, where the expected output is:
> ['1', '2', '3', '4', '', '5']
> At least, that's what I would expect. Treating contiguous blocks of
> whitespace as single delimiters is perfectly reasonable when None is
> provided as the delimiter, but when an explicit delimiter has been
> provided, it strikes me that the code shouldn't try to further-
> interpret it...
> Does anyone else have any opinion here?
I agree. If the user explicity passes something as a delimiter, we
should use it and not try to be too smart.
Graduate Research Assistant
School of Meteorology
University of Oklahoma
More information about the Numpy-discussion