[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Skipper Seabold jsseabold@gmail....
Thu Jul 8 13:45:09 CDT 2010


On Thu, Jul 8, 2010 at 2:41 PM, Rob Speer <rspeer@mit.edu> wrote:
> On Thu, Jul 8, 2010 at 2:27 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
>> On Thu, Jul 8, 2010 at 1:35 PM, Rob Speer <rspeer@mit.edu> wrote:
>>> Your labels are unique if you look at them the right way. Here's how I
>>> would represent that in a datarray:
>>> * axis0 = 'city', ['Austin', 'Boston', ...]
>>> * axis1 = 'month', ['January', 'February', ...]
>>> * axis2 = 'year', [1980, 1981, ...]
>>> * axis3 = 'region', ['Northeast', 'South', ...]
>>> * axis4 = 'measurement', ['precipitation', 'temperature']
>>>
>>> and then I'd make a 5-D datarray labeled with [axis0, axis1, axis2,
>>> axis3, axis4].
>>>
>>
>> Yeah, this is what I was thinking I would have to do, but it's still
>> not clear to me (I have trouble trying to think in 5 dimensions...).
>> For instance, what axis holds my actual numeric data?
>>
>> axis4, with a "precipitation" tick?
>
> Yep, that's what I was suggesting. Or you could have two different 4-D
> matrices, one whose values are precipitation and one whose values are
> temperatures.
>
>>> Now I realize not everyone wants to represent their tabular data as a
>>> big tensor that they index every which way, and I think this is one
>>> thing that pandas is for.
>>
>> This is kind of where I would like the divide to be between user and
>> developer.  On top of all of this, I would like to see a __repr__ or
>> something that actually spits out a 2d spreadsheet-looking
>> representation.  It would help me stay sane I think.  Fernando's nice
>> 3D graphic only can go so far as a mental model (for me at least).
>
> Divisi2 uses a 2D labeled representation as its __str__ -- an example
> is at http://csc.media.mit.edu/docs/divisi2/sparse.html
>
> I could port this onto datarray. I was holding off because I was
> unsure about how to represent the N-d case, but I realize now that
> showing the entries in this kind of 2-D tabular format could actually
> be a really intuitive way to do it.
>

+1.  When you first showed the printed divisi array for the movie data
I definitely had an "aha" moment.

>> Mix-ins sounds reasonable to me as long as this could easily be
>> accomplished.  Ie., why use csr?  Can you go between others?  Are the
>> sparse matrices reasonably stable given recent activity?  Not
>> rhetorical questions, I don't use sparse matrices much.
>
> These are good questions.
>
> I ended up using PySparse instead of scipy.sparse, because SciPy 0.7's
> sparse matrices weren't ready to support many important operations,
> particularly slicing. SciPy 0.8's sparse matrices look much better,
> and I may transition to using them once it's released.
>
> When planning future features of NumPy, of course, we should assume
> SciPy's sparse matrices do what we want (and possibly fix them if they
> don't).
>
> csr_matrix was just an example. I think there would have to be
> separate classes for labeled csr_matrices, labeled lil_matrices, and
> so on, supporting all the usual methods for converting between them.

Ok, sounds good to me.  Just wanted to make sure.

Skipper


More information about the NumPy-Discussion mailing list