[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Keith Goodman kwgoodman@gmail....
Fri Jul 9 15:24:17 CDT 2010

On Fri, Jul 9, 2010 at 1:17 PM, Joshua Holbrook <josh.holbrook@gmail.com> wrote:
> On Fri, Jul 9, 2010 at 11:42 AM, Rob Speer <rspeer@mit.edu> wrote:
>> Now, the one part I've implemented that I just made up instead of
>> looking to the SciPy consensus (because there was no SciPy consensus)
>> was how to refer to multiple labeled axes without repeating ".axis"
>> all over the place. My choice, which I call "magical axis attributes",
>> is to have arr.somelabel == arr.axis.somelabel whenever it doesn't
>> mean something else. This turns the call
>>  arr.axis.country.named['Netherlands'].axis.year[-1]
>> into:
>>  arr.country.named['Netherlands'].year[-1]
>> I got a message from Fernando Perez saying that he didn't like the
>> magical axis attributes, for the expected reason that it's
>> inconsistent. You shouldn't have to refer to your axis differently
>> just because you called it something like "mean". Another problem that
>> just occurred to me is that
>> datarray-using code could break just because DataArray, or even
>> ndarray itself, grew a new method.
>> I like the syntax that magical attributes provide, but I'm willing to
>> consider other options. Here's one:
>> The __getattr__ only does its magic on attribute names that end in
>> "_index" or "_named", which should not conflict with other method
>> names. "arr.foo_index[3]" is the same as "arr.axis.foo[3]".
>> Furthermore, "arr.foo_named['bar']" is the same as
>> "arr.axis.foo.named['bar']". Then the above lookup becomes:
>>  arr.country_named['Netherlands'].year_index[-1]
>> I don't find this as appealing as magical attributes, but perhaps it's
>> more responsible. I'd like to know what other people think, so let me
>> summarize and name the existing proposals:
>> arr.axis.country.named['Netherlands'].axis.year[-1]   # the default
>> option -- works in any case
>> arr[ arr.aix.country.named['Netherlands'].year[-1] ]   # the "stuple" option
>> arr.country.named['Netherlands'].year[-1]                  # the
>> "magical" option
>> arr.country_named['Netherlands'].year_index[-1]    # the "semi-magical" option
>> -- Rob
>> On Fri, Jul 9, 2010 at 1:39 AM, Rob Speer <rspeer@mit.edu> wrote:
>>> http://github.com/rspeer/datarray represents my best guess at the
>>> SciPy BOF consensus. I recently switched the method of accessing named
>>> ticks from .named() to .named[] based on further discussion here.
>>> My implementation is still missing the case with named ticks but
>>> positional axes, however. That is, you should be able to use .named
>>> directly on the top-level datarray without referring to any axis
>>> labels, to say something like arr.named['Netherlands', 2010], but you
>>> can't yet.
>>> -- Rob
>>> On Thu, Jul 8, 2010 at 11:44 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>>>> On Thu, Jul 8, 2010 at 1:20 PM, Fernando Perez <fperez.net@gmail.com> wrote:
>>>>> The consensus at the  BoF (not that it means it's set in stone, simply
>>>>> that there was  good chance for back-and-forth on the topic with many
>>>>> voices) was that:
>>>>> 1. There are valid use cases for 'integer ticks',  i.e. integers that
>>>>> index arbitrarily into an  array instead of in 0..N-1 fashion.
>>>>> 2. That having plain arr[0] give anything but the first element in arr
>>>>> would be way too confusing in practice, and likely to cause too many
>>>>> problems.
>>>>> 3. That the  best solution to allow integer ticks while retaining
>>>>> 'normal' indexing semantics for integers would be to have
>>>>> arr[int] -> normal indexing
>>>>> arr.somethin[int] -> tick-based indexing, where an int can mean anything.
>>>> Has the Scipy 2010 BOF consensus been implemented in anyone's fork? I
>>>> don't understand the indexing so I'd like to try it.
>>>> _______________________________________________
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> I personally find the magic attributes most appealing as well. I don't
> like the pseudomagic choice. I think what makes the magic attributes
> appealing is that it's so much less verbose than the
> alternatives--that is, axis.row --> row. While pseudo-magics is
> conceptually like magic attributes with decreased chance of conflicts,
> in practice it seems to merely turn that dot into an underscore--that
> is, axis.row --> axis_row.
> We'd still be able to do axis.row as it is, right? (I've been too busy
> being my parents' IT guy to get my hands dirty :( ) Maybe that would
> be the way to go--I mean, you have the option of the nice magic
> attribute action, but if it bothers you or you want your datarray to
> be more robust or whatever, you can use axis.row throughout. Maybe we
> could even have an enable/disable flag? I dunno.
> I almost feel like we should come up with some sort of hypothetical
> case of a datarray that we want to do specific things with, so we can
> talk about how we would do those things with a concrete example. It
> should probably be at least 3d. Maybe I'll mock one up over my lunch
> break.
> Oh, and in case anyone missed this email:
> On Thu, Jul 8, 2010 at 12:55 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
>> What do you think of adding a ticks parameter to DataArray? Would that
>> make sense?
>> Current behavior:
>>>> x = DataArray([[1, 2], [3, 4]], (('row', ['A','B']), ('col', ['C', 'D'])))
>>>> x.axes
>> (Axis(label='row', index=0, ticks=['A', 'B']),
>>  Axis(label='col', index=1, ticks=['C', 'D']))
>> Proposed ticks as separate input parameter:
>>>> x = DataArray([[1, 2], [3, 4]], labels=('row', 'col'), ticks=[['A', 'B'], ['C', 'D']])
>> I think this would make it easier for new users to construct a
>> DataArray with ticks just from looking at the function signature. It
>> would match the function signature of Axis. My use case is to use
>> ticks only and not names axes (at first), so:
>>>> x = DataArray([[1, 2], [3, 4]], labels=None, ticks=[['A', 'B'], ['C', 'D']])
>> instead of the current:
>>>> x = DataArray([[1, 2], [3, 4]], ((None, ['A','B']), (None, ['C', 'D'])))
>> It might also cause less typos (parentheses matching) at the command line.
>> I've only made a few DataArrays so I don't understanding the
>> ramifications of what I am suggesting.
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> I was going to reply to it after I considered its contents but kinda
> forgot until now.
> Anyways: while I like the idea of having ticks that correspond to
> their axis being next to each other as the current behavior goes, I
> find this alternative syntax easier to read, probably due to less
> parentheses.
> At any rate, this is definitely worth discussion imo.
> --Josh

I ran into a few more questions while playing with datarrays, so I
started a list:

More information about the NumPy-Discussion mailing list