[Numpy-discussion] np.loadtxt : yet a new implementation...

Zachary Pincus zachary.pincus@yale....
Tue Dec 2 13:47:38 CST 2008

Hi Pierre,

I've tested the new loadtxt briefly. Looks good, except that there's a  
minor bug when trying to use a specific white-space delimiter (e.g.  
\t) while still allowing other white-space to be allowed in fields  
(e.g. spaces).

Specifically, on line 115 in LineSplitter, we have:
             self.delimiter = delimiter.strip() or None
so if I pass in, say, '\t' as the delimiter, self.delimiter gets set  
to None, which then causes the default behavior of any-whitespace-is- 
delimiter to be used. This makes lines like "Gene Name\tPubMed ID 
\tStarting Position" get split wrong, even when I explicitly pass in  
'\t' as the delimiter!

Similarly, I believe that some of the tests are formulated wrong:
     def test_nodelimiter(self):
         "Test LineSplitter w/o delimiter"
         strg = " 1 2 3 4  5 # test"
         test = LineSplitter(' ')(strg)
         assert_equal(test, ['1', '2', '3', '4', '5'])

I think that treating an explicitly-passed-in ' ' delimiter as  
identical to 'no delimiter' is a bad idea. If I say that ' ' is the  
delimiter, or '\t' is the delimiter, this should be treated *just*  
like ',' being the delimiter, where the expected output is:
['1', '2', '3', '4', '', '5']

At least, that's what I would expect. Treating contiguous blocks of  
whitespace as single delimiters is perfectly reasonable when None is  
provided as the delimiter, but when an explicit delimiter has been  
provided, it strikes me that the code shouldn't try to further- 
interpret it...

Does anyone else have any opinion here?


On Dec 1, 2008, at 1:21 PM, Pierre GM wrote:

> Well, looks like the attachment is too big, so here's the  
> implementation. The tests will come in another message.
> <genload_proposal.py>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list