[SciPy-User] Suggestion for numpy.genfromtxt documentation

Bruce Southey bsouthey@gmail....
Wed Oct 7 14:20:18 CDT 2009


On 10/07/2009 10:52 AM, Skipper Seabold wrote:
> On Wed, Oct 7, 2009 at 11:25 AM, Dharhas Pothina
> <Dharhas.Pothina@twdb.state.tx.us>  wrote:
>    
>> Hi,
>>
>> It took me a while and a lot of trial and error to work out why this didn't work as expected.
>>
>> data = np.genfromtxt(fname,usecols=(2,3,4),names='x,y,z')
>>
>> this command works and does not return any warnings or errors, but returns an numpy array with no field names. If you use:
>>
>> data = np.genfromtxt(fname,usecols=(2,3,4),dtype=None,names='x,y,z')
>>
>> then the command does what I expect it to and returns a structured numpy array with field names. So essentially, the 'names' argument doesn't not work unless you also specify the 'dtype' argument.
>>      
What did you actually expect?
It would be very informative if you could provide a simple example of 
this for testing.

There are many combinations of arguments so not all have been tested and 
it is not always clear what the expected behavior should be.

>> I think, it would be less confusing to new users to either have this explicitly mentioned in the documentation string for the genfromtxt 'names' argument or to have the function default to 'dtype=None'  if the 'names' argument is specified without specifying the 'dtype' argument.
>>
>> - dharhas
>>      
> I came across this behavior recently and agree with you.  There is a
> patch in the works for this.
>
> See this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/33479
>
> And this ticket: http://projects.scipy.org/numpy/ticket/1252
>
> Cheers,
>
> Skipper
>    

 From the numpy help, there is this example:
data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'), 
('mystring','S5')], delimiter=",")

It does not help that the dtype of structured arrays also includes the 
actual name. So I do not think we can use dtype argument without using 
the combination of dtype and name. Perhaps if dtype is split into names 
and formats so that dtype=('name', 'format').

In some sense you are suggesting that we should have something like:

Ignore the use of None and True for dtype and names arguments:
i) If only dtype is only specified then use the specified dtype and add 
default names such as col1, col2,... if necessary

ii) If names is only specified then contruct the dtype as ('name', 
'default format')
iii) If formats is only specified then construct the dtype as ('default 
name', 'format')
iv) If only names and formats are only specified then construct the 
dtype as ('name', 'format')

v) If no dtype, names and formats are only specified then construct the 
dtype as ('default name', 'default format')

vi) If dtype and names or formats are specified then use dtype if it is 
of the form ('name', 'format') or use one of the previous cases.

When dtype is None this implies format is None so the format is obtained 
from the data. If names is not True then the names are either from the 
argument or default values.

If names argument is True then the names should be read from the data 
and one of the previous cases apply.

Bruce








More information about the SciPy-User mailing list