[SciPy-User] Suggestion for numpy.genfromtxt documentation

Pierre GM pgmdevlist@gmail....
Fri Oct 9 13:27:37 CDT 2009


On Oct 9, 2009, at 11:21 AM, Bruce Southey wrote:
> On 10/09/2009 09:21 AM, Skipper Seabold wrote:

As a disclaimer, I think some of you are misunderstanding the purpose  
of defaultfmt. It is meant to be used when fields are expected but no  
names are given, as a replacement of numpy's default "f%i". It is not  
meant to define new names. Think about it as a way to get around  
numpy's default.

>> data = np.genfromtxt(s, delimiter=",") # dtype=float
>>
>> In [54]: data
>> Out[54]: array([ 1.,  2.,  3.])
>>
> Rats, I forgot about plain arrays. But this is a bug because the  
> default
> argument is defaultfmt="f%i".

Wait, "it's not a bug, it's a feature (TM)". Cf the disclaimer

> But I this option is kept then I think the
> default argument of defaultfmt should be None.

The default is defaultfmt="f%i", just like numpy.

>> If default names are specified then it doesn't seem to pick them up  
>> as
>> of right now.
>>
>> s.seek(0)
>> data = np.genfromtxt(s, delimiter=",", defaultfmt="Var%i")
>>
>> In [79]: data
>> Out[79]: array([ 1.,  2.,  3.])
>>
> This is also a bug.

No, this works as expected: no names given (explicitly through `names`  
or implicitly with `names=True`), no names expected, explicit dtype  
(through the default `dtype=float`), so all is well.


>>> ii) If names is only specified then contruct the dtype as ('name',
>>> 'default format')
>>>
>> s.seek(0)
>> data = np.genfromtxt(s, delimiter=",", names=['var1','var2','var3'])
>> #dtype = float
>>
>> In [57]: data
>> Out[57]:
>> array((1.0, 2.0, 3.0),
>>       dtype=[('var1', '<f8'), ('var2', '<f8'), ('var3', '<f8')])
>>
> Excellent as what I expected.


>>> iii) If formats is only specified then construct the dtype as  
>>> ('default
>>> name', 'format')
>>>
>> This doesn't seem to work with the new easy dtype as noted above.
>>
>> But this does
>>
>> data = np.genfromtxt(s, delimiter=",", dtype=(int,int,float),
>> defaultfmt="var%i")
>>
>> In [72]: data
>> Out[72]:
>> array((1, 2, 3.0),
>>       dtype=[('var0', '<i8'), ('var1', '<i8'), ('var2', '<f8')])

Because here, you explicitly want a structured dtype but don't give  
any names, so you end up creating new names from a default format.

>>> v) If no dtype, names and formats are only specified then  
>>> construct the
>>> dtype as ('default name', 'default format')
>>>
>>>
>> Same case as above I think where
>>
>> s.seek(0)
>> data = np.genfromtxt(s, delimiter=",", defaultfmt="var%i")
>>
>> doesn't work as "expected" to zip float (the default format) with the
>> default name, specified by defaultfmt.

I appreciate the quotes around expected. `defaultfmt` is used only if  
a name is expected but can't be found. Here, no names are expected  
because of the default `dtype=float` works.

>>> vi) If dtype and names or formats are specified then use dtype if  
>>> it is
>>> of the form ('name', 'format') or use one of the previous cases.
>>>
>>>
>> This seems to be the case for defaultfmt,
>>
>> s.seek(0)
>> data = np.genfromtxt(s,
>> dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
>> defaultfmt="VAR%i")
>>
>> In [99]: data
>> Out[99]:
>> array((1, 2, 3.0),
>>       dtype=[('var1', '<i8'), ('var2', '<i8'), ('var3', '<f8')])
>>
>> But if names is specified, then it's never ignored
>>
>> s.seek(0)
>> data = np.genfromtxt(s,
>> dtype=[('var1',int),('var2',int),('var3',float)], delimiter=",",
>> names=['VAR1','VAR2','VAR3'])
>>
>> In [102]: data
>> Out[102]:
>> array((1, 2, 3.0),
>>       dtype=[('VAR1', '<i8'), ('VAR2', '<i8'), ('VAR3', '<f8')])
>>
>>
> Here the problem is which user input overrides the other. As long as  
> it
> is clearly documented what happens then I do not care (I care when
> things are not stated).

OK, I need to improve the documentation here. Yes, giving `names`  
overwrite the names given in dtype.


>>> When dtype is None this implies format is None so the format is  
>>> obtained
>>> from the data. If names is not True then the names are either from  
>>> the
>>> argument or default values.

Yes, `dtype=None` means that the format has to be defined from the  
data themselves. If the resulting dtype turns out to be structured,  
you will need names. If `names=True` they're read from the first valid  
line. If `names` is given, then use those ones. If `names=None`, then  
construct some from `defaulfmt`.

>
> But I think that this could still be handled by the names argument. So
> that if a user does not specify any name (name=None) and no dtype (or
> all columns have the same dtype) then we have to return a plain array.

Nope, it was never the case before: you're returning what can be  
estimated from the data.
"1, 1, 1" would give you ints, "1., 1., 1." would give you floats, "1,  
1., x" will give you (int,float,'|S1')

>>> If names argument is True then the names should be read from the  
>>> data
>>> and one of the previous cases apply.
>>>
>>>
>> It's a bit confusing to think of data type "formats" and have the
>> defaultfmt, perhaps it should be defaultnm?

well, defaultfmt is a format string, so it should be clear that it's  
used to format names.

>>
> I agree. With formats, I expect things like different character and
> numeric types. If we can add this to the names argument then we should
> not need it.




>> So in sum, I think we should maybe have a True argument for
>> defaultfmt, maybe change the name to defaultnm to avoid confusion,  
>> and
>> have it so the easy dtype construction works with defaultfmt.  I will
>> comment on the open tickets.

No. "Leave `defaultfmt` alone !"...


More information about the SciPy-User mailing list