[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes
Wed Jul 7 08:52:37 CDT 2010
On 07/06/2010 01:09 PM, Gael Varoquaux wrote:
> Just to give a data point, my research group and I would be very excited
> at the idea of having Fernando's data arrays in Numpy. We can't offer to
> maintain it, because we are already fairly involved in machine learning
> and neuroimaging specific code, but we would be able to rely on it more
> in our packages, and we love it!
> On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote:
>> Fernando Perez proposed a NumPy enhancement, an ndarray with named axes,
>> prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew
>> Brett, Kilian Koepsell and Stefan van der Walt.
>> At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather)
>> discussion of this proposal.
>> The notes from this BOF can be found at:
>> (linked from the Plans section of http://projects.scipy.org/numpy )
>> HELP NEEDED: Fernando does not have the resources to drive the project
>> beyond this prototype, which already does what he needs. If this is to go
>> anywhere, it needs people to do the work. Please step forward.
>> Visible links
>> 1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes
>> 2. http://projects.scipy.org/numpy
This is very interesting work especially if can be used to extend or
replace the current record arrays (and perhaps structured arrays). If it
can not then you really need to make a case for yet another data
structure. Currently we will have all these unnecessary and incompatible
hybrids rather than a single option - competition is not good. I really
dislike the current impasse with numpy's Matrix class and do not wish
this to happen again. However, I am not saying that you can not create
another scikit rather that there has to be some consideration if if is
to go back into numpy/scipy.
As per Wes's reply in this thread, I really do think that a set of
specific behaviors that are expected for this new data structure need to
be agreed upon. Currently speed should not an issue until the basic
functionality is covered. I think that there are at least the following
concerns that people need to agree on:
1) Indexing especially related to slicing and broadcasting.
2) Joining data structures - what to do when all data structures have
the same 'metadata' (axes, labels, dtypes) and when each of these
differ. Also, do you allow union (so the result is includes all axes,
labels etc present all data structures) or intersection (keep only the
axes and labels in common) operations?
3) How do you expect basic mathematical operations to work? For example,
what does A +1 mean if A has different data types like strings?
4) How should this interact with the rest of numpy?
More information about the NumPy-Discussion