[Numpy-discussion] The array interface published

Travis Oliphant oliphant at ee.byu.edu
Mon Apr 4 12:16:09 CDT 2005


Michiel Jan Laurens de Hoon wrote:

> Travis Oliphant wrote:
>
>>> 1) To what degree will the new array interface look different to 
>>> users of the existing Numerical Python?
>>
>>
>> Nothing will look different.  For now there is nothing to "install" 
>> so the array interface is just something to expect from other 
>> objects.    The only thing that would be different is in Numeric 24.0 
>> (if a users were to call array(<someobj>) and <someobj> supported the 
>> array interface then Numeric could return an array (without copying 
>> data). Older versions of Numeric won't benefit from the interface but 
>> won't be harmed either.
>
>
> Very nice. Thanks, Travis.
> I'm not sure what you mean by "the array interface could become part 
> of the Python standard as early as Python 2.5", since there is nothing 
> to install. Or does this mean that Python's array will conform to the 
> array interface?



The latter is what I mean...  I think it is important to have something 
in Python itself that "conforms to the interface."   I wonder if it 
would also be nice to have some protocol slots in the object type so 
that extension writers can avoid converting some objects.     There is 
also the possibility that a very simple N-d array type could be included 
in Python 2.5 that conforms to the interface, if somebody wants to 
champion that.



I think it is important to realize what the array interface is trying to 
accomplish.  From my perspective, I still think it is better for the 
scientific community to build off of a single array object that is "best 
of breed."  The purpose of the array interface is to allow us scientific 
users to share information with other Python extension writers who may 
be wary to require scipy.base for their users but who really should be 
able to interoperate with scipy.base arrays.    I'm thinking of 
extensions like wxPython, PIL, and so forth.  



There are also lots of uses for arrays that don't necessarily need the 
complexity of the scipy.base array (or uses that need even more 
types).   At some point we may be able to accomodate dynamic type 
additions to the scipy.base array.  But, right now it requires enough 
work that others may want to design their own simple arrays.  It's very 
useful if all such arrays could speak together with a common basic 
language.


The fact that numarray and Numeric arrays can talk to each other more 
seamlessly was not the main goal of the array interface but it is a nice 
side benefit.   I'd still like to see the scientific community use a 
single array.  But, others may not see it that way.  The array interface 
lets us share more easily.


>
> Some comments on the array interface:
>
> 1) The "__array_shape__" method is identical to the existing "shape" 
> method in Numerical Python and numarray (except that "shape" does a 
> little bit better checking, but it can be added easily to 
> "__array_shape__"). To avoid code duplication, it might be better to 
> keep that method. (and rename the other methods for consistency, if 
> desired).


There is no code duplication.  In these cases it is just another name 
for .shape.    What "better
checking" are you referring to?

>
> 2) The __array_datalen__ is introduced to get around the 32-bit int 
> limitation of len(). Another option is to fix len() in Python itself, 
> so that it can return  integers larger than 32 bits. So we can avoid 
> adding a new method.


Python len() will never return a 64-bit number on a 32-bit platform.  

>
> 3) Where do default values come from? Is it the responsability of the 
> extension module writer to find out if the array module implements 
> e.g. __array_strides__, and substitute the default values if it 
> doesn't? If so, I have a slight preference to make all methods 
> required, since it's not a big effort to return the defaults, and 
> there will be more extension modules than array packages (or so I hope).


Optional attributes let modules that care talk to each other on a 
"higher level" without creating noise for simpler extensions.   Both the 
consumer and exporter have to use it to matter.  The defaults are just 
clarifying what is being assumed if it isn't there. 


>
> Whereas the array interface certainly helps extension writers to 
> create an extension module that works with all array implementations, 
> it also enables and perhaps encourages the creation of different array 
> modules, while our original goal was to create a single array module 
> that satisfies the needs of both Numerical Python and numarray users. 
> I still think such a solution would be preferable. 


I agree with you.   I would like a single array module for scientific 
users.  But, satisfying everybody is probably impossible with a single 
array object.    Yes, there could be a proliferation of array objects 
but sometimes we need multiple array objects to learn from each other.   
It's nice to have actual code that implements some idea rather than just 
words in a mailing list. 


The interface  allows us to talk to each other while we learn from each 
other's actual working implementations. 


In a way this is like the old argument between the 1920-era communists 
and the free-marketers.  The communists say that we should have only one 
company that produces some product because having multiple companies is 
"wasteful" of resources,  while the free-marketers point out that 
satisfying consumers is tricky business, and there is not only "one 
right way to do it."  Therefore,  having multiple companies each trying 
to satisfy consumers actually creates wealth as new and better ideas are 
tried by the different companies.  The successful ideas are emulated by 
the rest.   In mature markets there tend to be a reduction in the number 
of producers while in developing markets there are all kinds of 
companies producing basically the same thing. 


Of course software creates it's own issues that aren't addressed by that 
simple analogy, but I think it's been shown repeatedly that good 
interfaces (http, smtp anyone?) create a lot of utility.

> Inconsistencies other than the array interface (e.g. one implements 
> argmax(x) while another implements x.argmax()) may mean that an 
> extension module can work with one array implementation but not with 
> another, even though they both conform to the array interface. We may 
> end up with several array packages (we already have Numerical Python, 
> numarray, and scipy), and extension modules that work with one package 
> and not with another. So in a sense, the array interface is letting 
> the genie out of the bottle.


I think this genie is out of the bottle already.  We need to try and get 
our wishes from it now.

-Travis







More information about the Numpy-discussion mailing list