[Numpy-discussion] np.loadtxt : yet a new implementation...

Ryan May rmay31@gmail....
Tue Dec 2 13:56:26 CST 2008


Zachary Pincus wrote:
> Specifically, on line 115 in LineSplitter, we have:
>              self.delimiter = delimiter.strip() or None
> so if I pass in, say, '\t' as the delimiter, self.delimiter gets set  
> to None, which then causes the default behavior of any-whitespace-is- 
> delimiter to be used. This makes lines like "Gene Name\tPubMed ID 
> \tStarting Position" get split wrong, even when I explicitly pass in  
> '\t' as the delimiter!
> 
> Similarly, I believe that some of the tests are formulated wrong:
>      def test_nodelimiter(self):
>          "Test LineSplitter w/o delimiter"
>          strg = " 1 2 3 4  5 # test"
>          test = LineSplitter(' ')(strg)
>          assert_equal(test, ['1', '2', '3', '4', '5'])
> 
> I think that treating an explicitly-passed-in ' ' delimiter as  
> identical to 'no delimiter' is a bad idea. If I say that ' ' is the  
> delimiter, or '\t' is the delimiter, this should be treated *just*  
> like ',' being the delimiter, where the expected output is:
> ['1', '2', '3', '4', '', '5']
> 
> At least, that's what I would expect. Treating contiguous blocks of  
> whitespace as single delimiters is perfectly reasonable when None is  
> provided as the delimiter, but when an explicit delimiter has been  
> provided, it strikes me that the code shouldn't try to further- 
> interpret it...
> 
> Does anyone else have any opinion here?

I agree.  If the user explicity passes something as a delimiter, we 
should use it and not try to be too smart.

+1

Ryan

-- 
Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma


More information about the Numpy-discussion mailing list