[Numpy-discussion] Latest Array-Interface PEP

Travis Oliphant oliphant at ee.byu.edu
Tue Jan 9 01:00:31 CST 2007


Timothy Hochberg wrote:
>
>
> On 1/6/07, *Travis Oliphant* <oliphant at ee.byu.edu 
> <mailto:oliphant at ee.byu.edu>> wrote:
>
>     Tim Hochberg wrote:
>     > Christopher Barker wrote:
>     >
>     > [SNIP]
>     >
>     >> I think the PEP has far more chances of success if it's seen as a
>     >> request from a variety of package developers, not just the
>     numpy crowd
>     >> (which, after all, already has numpy
>     >>
>     > This seems eminently sensible. Getting a few developers from other
>     > projects on board would help a lot; it might also reveal some
>     > deficiencies to the proposal that we don't see yet.
>     >
>     It would help quite a bit.  Are there any suggestions of who to
>     recruit
>     to review the proposal? 
>
>
> Before I can answer that, I need to ask you a question. How do you see 
> this extension to the buffer protocol? Do you see it as an supplement 
> to the earlier array protocol, or do you see it as a replacement?

This is a replacement to the previously described array protocol PEP.  
This is how I'm trying to get the array protocol into Python.

In that vein, it has two purposes:

One is to make a better buffer protocol that includes a conception of an 
N-dimensional array in Python itself.   If we can include this in Python 
then we get a lot of mileage out of all the people that write extension 
modules for Python that should really be making their memory available 
as an N-dimensional array (everytime I turn around there is a new 
wrapping to some library that is *not* using NumPy as the underlying 
extension).  With the existence of ctypes it just starts to get worse as 
nobody thinks about exposing things as arrays anymore and so NumPy users 
don't get the ease of use we would get if the N-dimensional array 
concept were a part of Python itself.

For example, I just found the FreeImage project which wraps a nice 
library using ctypes.  But, it doesn't have a way to expose these images 
as numpy arrays.   Now, it would probably take me only a few hours to 
make the connection between FreeImage and NumPy, but I'd like to see the 
day when it happens without me (or some other NumPy expert) having to do 
all the work.   If ctypes objects exposed the extended buffer protocol 
for appropriate types, then I wouldn't have to do anything.  Because the 
wrapped structures would be exposable as arrays and all of a sudden I say

a  = array(freeimobj)

and I can do math on the array in Python.

Or if I'm an extension module writer, I don't need to have NumPy (or 
rely on it) in order to do some computation on freeimobj in C itself.

Sure, you can do it now (if the array protocol is followed --- but not 
many people have adopted it yet --- some have argued that it's "not in 
Python itself").   So, I guess, the big reason I'm pushing this is 
largely marketing.

The buffer protcol is the "right" place to but the array protocol.

The second reason is to ensure that the buffer protocol itself doesn't 
"disappear"  in Python 3000.  Not all the Python devs seem to really see 
the value of it.   But, it can sometimes be unclear as to what the 
attitudes are.


>     >          2. Is there any type besides Py_STRUCTURE that can have
>     names
>     >             and fields. If so, what and what do they mean. If
>     not, you
>     >             should just say that.
>     >
>     Yes, you can add fields to a multi-byte primitive if you want.  This
>     would be similar to thinking about the data-format as a C-like union.
>     Perhaps the data-field has meaning as a 4-byte integer but the
>     most-significant and least-significant bytes should also be
>     addressable
>     individually. 
>
>
> Hmm. I think I understand this somewhat better now, but I can't decide 
> if it's cool or overkill. Is this a supporting a feature that ctypes has?
I don't know.  It's basically a situation where it's easier to support 
it than to not and so it's there.

>  
>
>     >          3. And on this topic, why a tuple of ([names,..],
>     {field})? Why
>     >             not simply a list of (name, dfobject, offset, meta) for
>     >             example? And what's the meta information if it's not
>     PyNone?
>     >             Just a string? Anything at all?
>     >
>
>     The list of names is useful for having an ordered list so you can
>     traverse the structure in field order.   It is technically not
>     necessary
>     but it makes it a lot easier to parse a data-format object in offset
>     order (it is used a bit in NumPy, for example).
>
>
> Right, I got that. Between names and field you are simulating an 
> ordered dict. What I still don't understand is why you chose to 
> simulate this ordered dict using a list plus a dictionary rather than 
> a list of tuples. This may well just be a matter of taste. However, 
> for the small sizes I'd expect of these lists I would expect a list of 
> of tuples would perform better than the dictionary solution.
Ah.  I misunderstood.  You are right that if I had considered needing an 
ordered list of names up front, this kind of thing makes more sense.  I 
think the reason for the choice of dictionary is that I was thinking of 
field access as attribute look-up which is just dictionary look-up.  So, 
conceptually that was easier for me.    But, tuples are probably less 
over-head  (especially for small numbers of fields) with the expense of 
having to search for the field-name on field access.  

But, I'm trusting that dictionaries (especially small ones) are pretty 
optimized in Python (I haven't tested that assertion in this particular 
case, though). 

>
> FWIW, the array protocol PEP seems more relevant to what I do since 
> I'm not concerned so much with the overhead since I'm sending big 
> chunks of data back and forth.

This proposal is trying to get the array protocol *into* Python.  So, 
this is the array protocol PEP.  Anyone supportive of the array protocol 
should be interested in and thinking about this PEP.


-Travis



More information about the Numpy-discussion mailing list