[Numpy-discussion] Reductions and binary ops on recarrays...

josef.pktd@gmai... josef.pktd@gmai...
Thu Jul 30 09:55:03 CDT 2009


2009/7/30 Stéfan van der Walt <stefan@sun.ac.za>:
> 2009/7/30 Fernando Perez <fperez.net@gmail.com>:
>> we recently had a discussion about being able to do some common things
>> like reductions and binary operations on recarrays, and there didn't
>> seem to be much consensus on it being needed in the core of numpy.
>>
>> Since  we do actually need this quite pressingly for everyday tasks,
>> we wrote a very simple version of this today, I'm attaching it here in
>> case it proves useful to others.
>
> I'm in favour of such a patch, but I'd like to see whether we can't do
> it at the C level for structured arrays in general.
>
> Regards
> Stéfan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


Are these functions really for a relevant use case of structured
arrays. I haven't seen any examples of multidimensional structured
arrays, but from a quick reading it doesn't seem to handle mixed types
(raises error) or nested structured arrays (I'm not sure), for which I
have seen a lot more examples.


I was looking for or writing something similar but only for 1d
structured arrays, i.e. a 2d dataset.

for homogenous dtypes it is relatively easy to create a view on which
standard array operations can be applied.

mixed dtypes
however, I wanted a version that can handle mixed dtypes, in my case
integer and floats, that upcasts all numerical types to the highest
dtype, to floats in my examples. integers where categorical data that
I want as integers for e.g. np.bincount .

temporary/ conversion array reuse
When many array operations have to be applied to the data of the
structured array, it is better to keep a converted copy of the
structured array around, instead of doing the conversion each time.
Although, since it's a copy, I used it read only. For example
calculating in sequence mean, variance and correlation, deviations
from mean and so on, requires only one conversion.

For me it would have been more useful to have better documentation and
helper functions to convert structured arrays to standard arrays for
the calculations.

I looked at this mostly for the statistical use, where I didn't want
the result to be structured arrays, so these recarrutil might not be
of much use in this case, and consist to a large part of functionality
that won't be needed, e.g. the multidimensional overhead.

The recarray helper functions are useful and build in support as
Stefan proposes would be nice.
However, since I started only recently to us, I'm not sure what the
relevant structure (dimensionality and dtypes) of structured/rec
arrays are. But nested and mixed dtypes seem to be more important than
multidimensionality in the examples I have seen.

For example when we don't have a balanced panel so the structured
array cannot be reshaped into a rectangular shape according to some
variables, then reductions and operations like groupby are more useful
for data analysis
http://matplotlib.sourceforge.net/api/mlab_api.html#matplotlib.mlab.rec_groupby

my 2c, after a brief look at the code

Josef


More information about the NumPy-Discussion mailing list