[SciPy-User] Suggestion for numpy.genfromtxt documentation
Wed Oct 7 15:22:18 CDT 2009
On Wed, Oct 7, 2009 at 3:20 PM, Bruce Southey <email@example.com> wrote:
> On 10/07/2009 10:52 AM, Skipper Seabold wrote:
>> On Wed, Oct 7, 2009 at 11:25 AM, Dharhas Pothina
>> <Dharhas.Pothina@twdb.state.tx.us> wrote:
>>> It took me a while and a lot of trial and error to work out why this didn't work as expected.
>>> data = np.genfromtxt(fname,usecols=(2,3,4),names='x,y,z')
>>> this command works and does not return any warnings or errors, but returns an numpy array with no field names. If you use:
>>> data = np.genfromtxt(fname,usecols=(2,3,4),dtype=None,names='x,y,z')
>>> then the command does what I expect it to and returns a structured numpy array with field names. So essentially, the 'names' argument doesn't not work unless you also specify the 'dtype' argument.
> What did you actually expect?
> It would be very informative if you could provide a simple example of
> this for testing.
> There are many combinations of arguments so not all have been tested and
> it is not always clear what the expected behavior should be.
>>> I think, it would be less confusing to new users to either have this explicitly mentioned in the documentation string for the genfromtxt 'names' argument or to have the function default to 'dtype=None' if the 'names' argument is specified without specifying the 'dtype' argument.
>>> - dharhas
>> I came across this behavior recently and agree with you. There is a
>> patch in the works for this.
>> See this thread: http://thread.gmane.org/gmane.comp.python.numeric.general/33479
>> And this ticket: http://projects.scipy.org/numpy/ticket/1252
> From the numpy help, there is this example:
> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
> ('mystring','S5')], delimiter=",")
These examples got added recently, so it may not be in your version of
numpy if you haven't updated. You can see them here:
> It does not help that the dtype of structured arrays also includes the
> actual name. So I do not think we can use dtype argument without using
> the combination of dtype and name. Perhaps if dtype is split into names
> and formats so that dtype=('name', 'format').
In the first example above, since float is the default for dtype it's
really dtype=float, and names=[...]. Names doesn't get used and it
returns a plain ndarray. All that it would take is zipping float with
each of the names so that it's a valid dtype. Right now, you could do
dtype="f, f, f" or whatever and names = ['var1','var2',var3']. In the
second example dtype = None determines the actual format of the data
from the data itself and constructs the dtype.
> In some sense you are suggesting that we should have something like:
> Ignore the use of None and True for dtype and names arguments:
I don't think I (at least) am suggesting to ignore anything from the user.
> i) If only dtype is only specified then use the specified dtype and add
> default names such as col1, col2,... if necessary
This is what happens right now. But f0, f1, ... instead of col.
> ii) If names is only specified then contruct the dtype as ('name',
> 'default format')
Or whatever is passed to dtype. See above.
> iii) If formats is only specified then construct the dtype as ('default
> name', 'format')
What is formats? This is the same case as i? Are you suggesting
adding a formats keyword? I suggested `type` to distinguish between a
real dtype and this non-standard behavior that's being proposed now,
but Pierre doesn't seem to think it's necessary, and I guess I agree
as long as new users don't get too confused by this and it's
documented as non-standard.
> iv) If only names and formats are only specified then construct the
> dtype as ('name', 'format')
> v) If no dtype, names and formats are only specified then construct the
> dtype as ('default name', 'default format')
> vi) If dtype and names or formats are specified then use dtype if it is
> of the form ('name', 'format') or use one of the previous cases.
> When dtype is None this implies format is None so the format is obtained
> from the data. If names is not True then the names are either from the
> argument or default values.
> If names argument is True then the names should be read from the data
> and one of the previous cases apply.
I think I agree with this, except I don't think the `format` keyword
is totally necessary.
Basically, I want to leave the behavior as is, but if names is True or
a sequence, then they're never ignored and the dtype is constructed
for the user as "expected".
More information about the SciPy-User