[Numpy-discussion] Introduction

Scott Gilbert xscottg at yahoo.com
Sun Apr 14 04:20:03 CDT 2002

Perry, I've been trying to be persuasive, but I think all I've 
managed to do is to be verbose and annoy you.  Please accept 
my apologies.

I really am sorry this is going as poorly as it is.  I'm doing a lousy
job of getting my point across, and I'd like to turn around the tone
this has taken.  Email always comes off as more antagonistic
than intended.

Finally, my appeal to the fact that you are proposing a standard
was heavy handed.  I guess I was trying to use that to force
you to consider my position.  It clearly backfired...

I'll try to be more to the point.

Here's what I'm proposing, and it's only a suggestion.

*** I think the requirements for being a general purpose "NDArray" 
can be specified with only the following attributes:

    __array_buffer__    - as buffer object
    __array_shape__     - as tuple of long
    __array_itemsize__  - as int

    __array_stride__    - as tuple of long (get from shape if None)
    __array_offset__    - as int (would default to 0 if not present)

Then anyone who implemented these could work with the same C API for
getting the pointer to memory, shape array, stride array, and item size.  

The set of operations on a pure "NDArray" is probably pretty minimal
(reshape, transpose/rotate, index arrays?).

So in order to create a full featured "NumArray", a few more attributes
are required:

    __array_itemtype__  - as string?

    __array_endian__    - as 1 char string?  (default to the native endian)

This brings the total up to 4 required attributes, and 3 optional ones 
for a very general purpose array data structure.  (I can think of other 
optional ones, but skip that for now.)

> All in all you are talking about checking quite a few attributes
> to make sure the object has the interface. And even if it does,
> *why* in the world would we presume that the C functions used by
> numarray would work properly with the object you provide.

Because truthfully arrays are little more than a pointer to memory.

That's like asking "why in the world would we presume memcpy() or 
qsort() would know what to do with your memory?"

> You haven't provided any example (let
> alone a compelling one) of why we should accept any object that
> provides those attributes.

Well, the UFuncs certainly should reject any object that they don't
know how to handle.  I'm currently only addressing what it takes to be
an NDArray/NumArray object.  OTOH, if I can present something to the
UFuncs that looks like a known array type, why wouldn't UFuncs
want to work with it?

Ok, so what does this buy you?  

Well, it probably doesn't buy you personally very much.  Your needs are
already being met by the current implementation.

Ok, so what does this cost you?

A few translations:

    _data       -> __array_buffer__
    _shape      -> __array_shape__
    _strides    -> __array_stride__
    _itemsize   -> __array_itemsize__
    _offset     -> __array_offset__
    _type       -> __array_type__
    _byteswap   -> __array_endian__

This isn't a style criticism.  I'm not just asking you to change your
I'm asking to promote the names to be a "standard interface" much like
these things are in many places in Python.

Also requires some small changes to getNDInfo() and getNumInfo()
so that they can calculate the derived fields (contiguous, aligned,

Also requires some changes to your scripts so that it checks for
the interface rather than the inheritance.

What are the benefits to anyone else?

- Describes how anyone could implement something that looks and acts
like NDArrays or NumArrays.  There are probably a lot of reasons to
want to do this.  I have some reasons that I don't think you value
too much.  I think others would have reasons which I can't imagine too.

- Allows one standard API for getting at the basics of NDArrays/NumArrays

- Allows anyone to easily implement other data types for NumArrays.
The typecode won't match any of your builtin types, but maybe other
third parties could agree on other typecodes for their crazy needs and
share modules.

- Allows me personally to distribute a separate (and simpler)
implementation of NDArrays/NumArrays right now and have the same data
objects work with yours when you're all done.  If I give the UFuncs a
pointer to memory, and the attributes above, why shouldn't it work

> We're not going to budge until you show us what the hell you are talking
> about.

Am I doing any better?  I am trying.

> You are right on complex ints (that we won't consider them). One
> could take numarray and add them if one wanted and have a more
> extended version. But we won't do it, and we wouldn't support as
> being in what we maintain. It's one of those trade offs.

Is there a way, today, without modifying numarray, for me to use
numarray as a holder for these esoteric data types?  Is that way difficult?
 Could it be easier?

I'm not asking numarray to know about my types in it's core baseline.  I'm
wondering what it takes to implement new types at all.

> Your example shows nothing about what your
> real needs for the object are.

My real needs are all over the place.  Some of which you've shown me
are solvable with the current implementation of numarray.  Some of
which you've not addressed or said you won't address.

To be explicit:

Here are (at least most of) my _needs_ for array objects:

      - support a wide variety of data types (user defined)
      - have efficient storage
      - support the pickle interface for serialization
      - allow alternate sources of underlying memory
      - have an easy interface for accessing the pieces
        necessary to create C extensions (buffer, shape, stride, ...)
      - completed and reliable in the near term

Here are (at least some of) my _wants_ for array objects:

      - cooperate on some level with other standard array
        modules (once the standard is set)
      - have same API for accessing the pieces (buffer, shape,
        stride, ...) as all standard array modules will.
      - implementation in pure Python so that building extension
        modules is not required until the fast operations present
        in those modules is required.
      - implemented from a standard that is as good as it can be

Here are (at least some of) my _whims_ for array objects:

      - has "windowing" functionality to work efficiently with
        really large files (on any modern platform).
      - alternate implementations for things such as "slicing
        behaviour" (copy on write, reference).

Loosely following your design, I've already written a module that meets 
my "needs", I was hoping that we could cooperate towards filling in some
of my "wants" (cooperating array modules), and I've brought up my "whims"
because I thought they were interesting possibilities for discussion.

I was going to respond to some of your other remarks, but I've probably
wasted enough of your time.  If you don't respond to this message, I'll
take that as a sign that we just aren't going to see eye to eye on any of 
this, and I won't bother you any more. 

(I'll be half surprised if you even get this message.  From the tone
of your last one, I wouldn't be shocked to find out you've already
added me to your killfile. :-)

No hard feelings,
      -Scott Gilbert

Do You Yahoo!?
Yahoo! Tax Center - online filing with TurboTax

More information about the Numpy-discussion mailing list