[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
Thu Jul 8 09:54:58 CDT 2010
On Thu, Jul 8, 2010 at 3:13 AM, Lluís <firstname.lastname@example.org> wrote:
> Rob Speer writes:
>>>>> arr.country.named('Spain').year.named(slice(1994, 2010))
> This looks too verbose to me.
> As axis always have a total order, I'd go for the most compact representation
> (assuming 'country' is the first axis, and 'year' the second one):
> This is my current implementation, which also allows for slices with mixed
> integers and names everywhere.
> I understand this might not be the desired default behaviour, as requires
> looking into the types of every item in '__getitem__', and this might be a
> performance issue (although my current implementation tries to optimize for the
> case of integer indexes).
> Thus, we can use something in the middle:
> arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks'
> The default '__getitem__' still has full speed, but accessing the 'named'
> attribute allows for accessing on the lines of my previous example, while still
> allowing the access through axis name without requiring an explicit 'slice'.
> Although this is not my preferred syntax, I think it is a good compromise, and I
> could always subclass this to redirect the default '__getitem__' into
> Btw, I store the names to index translations on an ordered dict (indexed by
> name), such that I can also provide an 'arr.iteritems' method that returns
> tuples with 'name/tick' and the array contents of that index. In the above
> syntax, this would probably be 'arr.<axisname>.iteritems'.
> Another feature I like is being able to translate back and forth from
> names/ticks to integers, which I do through my 'Dimension.__getitem__' method
> (Dimension is the equivalent of datarray's 'Axis').
> PS: I also have a separation between axis and their naming, meaning that I can
> have a single axis with both 'country' and 'year', such that I would index with
> 'Netherlands-2010' (other examples do make more sense), but still be able to
> access them separately (this reduces the size of the full ndarray, as there is
> no need for so many NaNs to make the ndarray homoheneus on size, and it brings
> the ndarray closer to the structuring of data on the mind of the user).
> Read you,
> "And it's much the same thing with knowledge, for whenever you learn
> something new, the whole world becomes that much richer."
> -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
> NumPy-Discussion mailing list
Isn't this the __getitem___ action we were trying to avoid?
More information about the NumPy-Discussion