[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
Bruce Southey
bsouthey@gmail....
Thu Jul 8 21:43:28 CDT 2010
On Thu, Jul 8, 2010 at 5:09 PM, Robert Kern <robert.kern@gmail.com> wrote:
> On Thu, Jul 8, 2010 at 18:00, Bruce Southey <bsouthey@gmail.com> wrote:
>> On Thu, Jul 8, 2010 at 4:39 PM, Rob Speer <rspeer@mit.edu> wrote:
>>>>> Still, I have a question. Did you also agree that this should forcibly index
>>>>> through ticks?
>>>>>
>>>>> arr.something[int] -> tick-based indexing
>>>>>
>>>>
>>>> Yes.
>>>
>>> I feel like people are talking about different things because it's
>>> unclear what the .something is.
>>>
>>> If the .something is an axis name, then no. arr.year[0] should get the
>>> first year in the data, not the data from the "year 0".
>>>
>>> If the .something is the attribute we use for named lookup (such as
>>> ".named"), then yes. arr.named[2006] should get whatever tick is named
>>> 2006 on the first axis.
>>> -- Rob
>>> _______________________________________________
>>
>> Then how is this not different than a record array?
>
> A record array lets you label exactly one notional "axis" (which isn't
> actually an axis as far as numpy is concerned). This lets you label
> all of the axes in a multidimensional array.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
I based this on the example at:
http://www.scipy.org/RecordArrays
>>> import numpy as np
>>> img = np.array([[(0,0,0), (1,0,0)], [(0,1,0), (0,0,1)]], {'names': ('named','g','b'), 'formats': ('f4', 'f4', 'f4')})
>>> arr= img.view(np.recarray)
>>> arr.named
array([[ 0., 1.],
[ 0., 0.]], dtype=float32)
>>> arr.named[:,1]
array([ 1., 0.], dtype=float32)
>>> img['named']
array([[ 0., 1.],
[ 0., 0.]], dtype=float32)
>>> arr['named']
array([[ 0., 1.],
[ 0., 0.]], dtype=float32)
I think that we need consistency with ndarrays such that the first
index is to the first axis, the second is to the second axis etc. This
means that the actual axis name is perhaps irrelevant when indexing
and slicing etc. Actually I have trouble thinking about how you refer
to a single axis in a multiple dimensional cases without addressing
the other axes. So from an example from Lluis:
"As axis always have a total order, I'd go for the most compact representation
(assuming 'country' is the first axis, and 'year' the second one):
arr['Netherlands','2010']
"
So continuing the example, you can do this:
>>> arr['named'][0,1]
1.0
But you can not do this:
>>> arr['named',0,1]
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
arr['named',0,1]
File "E:\Python26\lib\site-packages\numpy\core\records.py", line
453, in __getitem__
obj = ndarray.__getitem__(self, indx)
ValueError: setting an array element with a sequence.
Bruce
More information about the NumPy-Discussion
mailing list