[NumPy-Tickets] [NumPy] #1665: genfromtxt returns inconsistent output if column converters are used

NumPy Trac numpy-tickets@scipy....
Fri Nov 12 10:14:23 CST 2010


#1665: genfromtxt returns inconsistent output if column converters are used
--------------------+-------------------------------------------------------
 Reporter:  dmoore  |       Owner:  somebody      
     Type:  defect  |      Status:  needs_decision
 Priority:  normal  |   Milestone:  2.0.0         
Component:  Other   |     Version:  1.4.1         
 Keywords:          |  
--------------------+-------------------------------------------------------

Comment(by dmoore):

 >As you've seen, the output of your converter is not detected as a float,
 but as an
 >object. That's an unfortunate side effect of using a lambda function such
 as yours:
 >what if your input string has only 1 character ? You end up taking the
 float of an
 >empty string, which raises a ValueError?. In practice, that's exactly
 what happens
 >below the hood when genfromtxt tries to guess the output type of the
 converter. It
 >tries a single value ('1'), fails, and decides that the result must be an
 object...
 >Probably not the best strategy, as it crashes in your case. But yours is
 a buggy
 >case anyway.

 Sorry to disagree, but why is my example a buggy case? If I can guarantee
 that all of my input will convert correctly using my converter (i.e. all
 of my column 1 input ALWAYS consists of an alpha char following by an
 integer) no bugs occur. IMO genfromtxt shouldn't pass invalid values to
 the converter.

 btw, the code I gave is only an example to illustrate the problem. My
 actual code is much more complicated and involves converting dates and
 various strings to numeric values on very large datasets. In all of those
 cases, having to handle the case when genfromtxt passes the value '1'
 makes no sense in the context of my data and cruds up my code. I can at
 least use that as a workaround for now, but it wasn't obvious genfromtxt
 worked like that from the docs.

 >You could object that as the dtype is defined, it should take precedence
 over the
 >output typeof the converter. Well, I assumed exactly the opposite: if the
 user took
 >the time to define a converter, we should respect his/her choice and
 overwrite the
 >dtype.

 Not sure why a caller would pass a dtype when using the converter if they
 didn't want the dtype to be operative. Here's a use case where respecting
 the dtype would be helpful: column 1:N = float, column N+1 = int, column
 N+2 = date. Converter is used to convert the date to an int type. Now say
 the user wants all types to be int using dtype = int. Currently, I would
 have to use a converter for every column.

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/1665#comment:2>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list