[Numpy-discussion] BOF notes: Fernando's proposal: NumPy?ndarray with named axes

Gael Varoquaux gael.varoquaux@normalesup....
Mon Jul 12 05:03:30 CDT 2010


On Sun, Jul 11, 2010 at 11:59:30AM +0000, Neil Crighton wrote:
> What is a use case for the new array type that can't be solved by
> structured/record arrays?  Sounds like it was decided at the Sciy
> BOF they were a good idea, several people have implemented a
> version of them and Fernando and Gael have both said they find
> them useful, so they must have something going for them.  Maybe
> Fernando or Gael could share an example where arrays with named
> axes and indices are especially useful, for the peanut gallery's
> benefit?

Because my name is in this e-mail, I feel obliged to answer, but I think
that my usecases and opinions are not any more important than anybody
else.

Let say that you have a dataset that is in a 3D array, where axis 0
corresponds to days, axis 1 to hours of the day, and axis 2 to
temperature, you might want to have the mean of the temperature in each
day, which would be in current numpy:

    data.mean(axis=0)

or the mean of the temperature at every hour, across the different days,
which would be:

    data.mean(axis=1)

I do such manipulation all the time, and keeping track of which axis is
what is fairly tedious and error prone. It would be much nicer to be able
to write:

    data.ax_day.mean(axis=0)
    data.ax_hour.mean(axis=0)

Also, when dealing in a library with such data and writing functions,
it's quite easy to have errors in the computations coming from
transpositions or other reorderings. Given an array, I have no way of
telling what axis corresponds to what, and to trace such error. If my
library has a convention that each ndarray should have a named axis
called 'time', I know how to do timeseries analysis on multidimensional
data.

My 2 cents,

Gaël


More information about the NumPy-Discussion mailing list