[Numpy-discussion] np.loadtxt : yet a new implementation...
Tue Dec 2 15:48:26 CST 2008
On Dec 2, 2008, at 3:12 PM, Ryan May wrote:
> Pierre GM wrote:
>> Well, looks like the attachment is too big, so here's the
>> implementation. The tests will come in another message.
> A couple of quick nitpicks:
> 1) On line 186 (in the NameValidator class), you use
> excludelist.append() to append a list to the end of a list. I think
> meant to use excludelist.extend()
> 2) When validating a list of names, why do you insist on lower casing
> them? (I'm referring to the call to lower() on line 207). On one
> this would seem nicer than all upper case, but on the other hand this
> can cause confusion for someone who sees certain casing of names in
> file and expects that data to be laid out the same.
I recall a life where names were case-insensitives, so 'dates' and
'Dates' and 'DATES' were the same field. It should be easy enough to
get rid of that limitations, or add a parameter for case-sensitivity
On Dec 2, 2008, at 2:47 PM, Zachary Pincus wrote:
> Specifically, on line 115 in LineSplitter, we have:
> self.delimiter = delimiter.strip() or None
> so if I pass in, say, '\t' as the delimiter, self.delimiter gets set
> to None, which then causes the default behavior of any-whitespace-is-
> delimiter to be used. This makes lines like "Gene Name\tPubMed ID
> \tStarting Position" get split wrong, even when I explicitly pass in
> '\t' as the delimiter!
OK, I'll check that.
> I think that treating an explicitly-passed-in ' ' delimiter as
> identical to 'no delimiter' is a bad idea. If I say that ' ' is the
> delimiter, or '\t' is the delimiter, this should be treated *just*
> like ',' being the delimiter, where the expected output is:
> ['1', '2', '3', '4', '', '5']
Well, all, stay tuned for yet another "yet another implementation..."
> Other than those, it's working fine for me here.
More information about the Numpy-discussion