[Numpy-discussion] The array interface published

Magnus Lie Hetland magnus at hetland.org
Wed Apr 6 02:59:18 CDT 2005


Scott Gilbert <xscottg at yahoo.com>:
>
> 
> --- Magnus Lie Hetland <magnus at hetland.org> wrote:
> > 
> > Do we really have to break backward compatibility in order to add more
> > dimensions to the array module?
> > 
> 
> You're right.  The Python array module could change in a backwards
> compatible way.  Possibly using keyword arguments to specify parameters
> that have never been there before.
> 
> We could probably make sense out of array.insert(), array.append(),
> array.extend(), array.pop(), and array.reverse() by giving those an "axis"
> keyword.  Even array.remove() could be made to work for more dimensions,
> but it probably wouldn't get used often.  Maybe some of these would just
> raise an exception for ndims > 1.

Sure. I guess basically the extend/pop/reverse/etc. methods and the
ndim-functionality would sort of be two quite different ways of using
arrays, so keeping them mutually exclusive doesn't seem like a problem
to me.

This might speak in favour of separating the functionality into two
different classes, but I think there's merit to keeping it gathered,
because this is partly for basic use(rs) who just want to get an array
and do things to it that make sense. Appending to a multidimensional
array (as long as we don't tempt them with an axis keyword) just
doesn't make sense -- so people (hopefully) won't do it.

> Then we'd have to add some additional typecodes for complex and a
> few others.

Yeah; the question is how compatible the typecode system is with the
new array protocol -- some overlap and some differences, I believe
(without checking right now)?

So -- this might look a bit like patchwork. But I think might get that
if we have two modules (or classes) too -- one, called array, with the
existing functionality, and one, called (e.g.) ndarray, with a similar
but incompatible interface... It *may* be better, but I'm not quite
sure I think so.

In my experience (which may be very biased and selective here ;) the
array module isn't exactly among the "hottest" features of Python or
the standard libs. In fact, it seems almost a bit pointless to me. It
claims to have "efficient arrays of numeric values" but is the
efficiency really that great, if you write your code in Python? (Using
lists and psyco would, quite possibly, be just as good, for example.)

So -- at *least* adding the array protocol to it would be doing it a
favour, i.e., making it a useful module, and sort of a prototypical
example of the protocol and such. Adding more dimensions might simply
make it more useful. (I've many times been asked by people how to
create e.g. two-dimensional arrays in Python. It would be nice if
there was actually some basic support for it.)

> Under the hood, it would basically be a complete reimplementation,

Sure; except for the (possibly minor?) work involved, I don't see that
this is a problem? (Well... The inherent instability of new code,
perhaps... But still.)

> but maybe that is the way to go...  It does keep the number of array
> modules down.

Yes.

> I wonder which way would meet less resistance in getting accepted in
> the core. I think creating a new ndarray object would be less risk
> of breaking existing applications.

I guess that's true.

> >
> > There may be some issues with, e.g., typecode, but still...
> >
> 
> The .typecode attribute could return the same values it always has.

Sure. But we might end up with, e.g., a constructor that looks almost
exactly like the numpy array() constructor -- but whose typecodes are
different... :/

> The .__array_typestr__ attribute would return the new style values.
> That's confusing, but probably unavoidable.

Yes, if we do use this approach.

If we only allow one-dimensional arrays here (i.e., only add the
protocol to the existing functionality) there might be less confusion?

Oh, I don't know. Having a separate module or class/type might be just
as good an idea. Perhaps I'm just being silly :->

> It would be nice if there was only one set of typecodes for all of
> Python,

Yeah -- or some similar system (using type objects).

> but I think we're stuck with many (array module typecores, struct
> module typecodes, array protocol typecodes). 

:(

Yes, lots of history here. Oh, well. Not the greatest of problems, I
guess.

But using different typecodes in the explicit user-part of the
ND-array interface in the stdlibs from those in scipy, for example,
seems like a decidedly Bad Idea(tm). So ... that might be a good
enough reason for using a separate ndarray entity, unless there can be
some upward compatibility somehow.

-- 
Magnus Lie Hetland                    Fall seven times, stand up eight
http://hetland.org                                  [Japanese proverb]




More information about the Numpy-discussion mailing list