[NumPy-Tickets] [NumPy] #2184: Remove

NumPy Trac numpy-tickets@scipy....
Wed Jul 11 16:51:06 CDT 2012


#2184: Remove
-----------------------+----------------------------------------------------
 Reporter:  khaeru     |       Owner:  somebody   
     Type:  defect     |      Status:  new        
 Priority:  normal     |   Milestone:  Unscheduled
Component:  numpy.lib  |     Version:  1.6.1      
 Keywords:             |  
-----------------------+----------------------------------------------------
 The documentation for `genfromtxt()` reads:

   When the variables are named (either by a flexible dtype or with
 ''names'', there must not be any header in the file (else a ValueError
 exception is raised).

 and also:

   If ''names'' is True, the field names are read from the first valid line
 after the first ''skip_header'' lines.

 The cause of this seems to be in
 [https://github.com/numpy/numpy/blob/master/numpy/lib/npyio.py#L1347
 numpy/lib/npyio.py at lines 1347-9]:

 {{{
     if names is True:
         if comments in first_line:
             first_line = asbytes('').join(first_line.split(comments)[1:])
 }}}

 '''The last line should read `first_line =
 first_line.split(comments)[0]`.'''

 With the current code, the input line:
 {{{
 # Example comment line
 }}}
 will be transformed to:
 {{{
 Example comment line
 }}}
 resulting in columns named 'Example', 'comment' and 'line' (this is what
 the warning in the documentation is about).

 But also the input line:
 {{{
 ColumnA ColumnB ColumnC # the column names precede this comment
 }}}
 will be transformed to:
 {{{
 the column names precede this comment
 }}}
 resulting in columns named 'the', 'column', 'names' …etc. In this instance
 actual column names present in the file are inappropriately discarded.

 By taking the  `[0]` portion of the split instead of `[1:]`:
  * Lines beginning with comments result in an empty string being passed to
 `split_lines()` on L1350, producing no usable output and causing the
 `while not first_values` loop to try the next line.
  * Partial-line comments following actual heading names are discarded,
 instead of the names themselves.
  * As a result, files can have commented headers of any length ''and''
 column names, simultaneously.

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/2184>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list