[Numpy-discussion] recfunctions.stack_arrays

Ryan May rmay31@gmail....
Tue Jan 27 15:23:35 CST 2009

Pierre GM wrote:
> [Some background: we're talking about numpy.lib.recfunctions, a set of  
> functions to manipulate structured arrays]
> Ryan,
> If the two files have the same structure, you can use that fact and  
> specify the dtype of the output directly with the dtype parameter of  
> mafromtxt. That way, you're sure that the two arrays will have the  
> same dtype. If you don't know the structure beforehand, you could try  
> to load one array and use its dtype as input of mafromtxt to load the  
> second one.

I could force the dtype.  However, since the flexibility is there in mafromtxt,
I'd like to avoid hard coding the dtype, so I don't have to worry about updating
the code if the file format ever changes (this parses live data).

> Now, we could also try to modify stack_arrays so that it would take  
> the largest dtype when several fields have the same name. I'm not  
> completely satisfied by this approach, as it makes dtype conversions  
> under the hood. Maybe we could provide the functionality as an option  
> (w/ a forced_conversion boolean input parameter) ?

I definitely wouldn't advocate magic by default, but I think it would be nice to
be able to get the functionality if one wanted to.  There is one problem I
noticed, however.  I found common_type and lib.mintypecode, but both raise errors
 when trying to find a dtype to match both bool and float.  I don't know if
there's another function somewhere that would work for what I want.

> I'm a bit surprised by the error message you get. If I try:
>  >>> a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[('a',int),  
> ('b',bool), ('c',float)])
>  >>> b = ma.array([(4, 5, 6)], dtype=[('a', int), ('b', float), ('c',  
> float)])
>  >>> test = np.stack_arrays((a, b))
> I get a TypeError instead (the field 'b' hasn't the same type in a and  
> b). Now, I get the 'two fields w/ the same name' when I use  
> np.merge_arrays (with the flatten option). Could you send a small  
> example ?

Apparently, I get my error as a result of my use of titles in the dtype to store
an alternate name for the field.  (If you're not familiar with titles, they're
nice because you can get fields by either name, so for the following example,
a['a'] and a['A'] both return array([1]).)  The following version of your case
gives me the ValueError:

>>> from numpy.lib.recfunctions import stack_arrays
>>> a = ma.array([(1,2,3)], mask=[(0,1,0)], dtype=[(('a','A'),int),
(('b','B'),bool), (('c','C'),float)])
>>> b = ma.array([(4,5,6)], dtype=[(('a','A'),int), (('b','B'),float),
>>> stack_arrays((a,b))
ValueError: two fields with the same name

As a side question, do you have some local mods to your numpy SVN so that some of
the functions in recfunctions are available in numpy's top level?  On mine, I
can't get to them except by importing them from numpy.lib.recfunctions.  I don't
see any mention of recfunctions in lib/__init__.py.


Ryan May
Graduate Research Assistant
School of Meteorology
University of Oklahoma

More information about the Numpy-discussion mailing list