[Numpy-discussion] Alternate C-only array protocol for speed?

Todd Miller jmiller at stsci.edu
Fri Apr 8 10:14:04 CDT 2005


On Fri, 2005-04-08 at 04:21, David M. Cooke wrote:
> It seems that people are worried about speed of the attribute-based
> array interface when using small arrays in C.

I was a little worried too,  but think the array protocol idea is a good
one in any case.  Thinking about this,  I'm wondering if what we used to
do in early numarray (0.2) wouldn't work here.  Our "consumer interface"
/ helper function looked more like this:

int getSimpleCArray(PyObject *o, SimpleCArray *info);

It basically just fills in the caller's SimpleCArray struct using
information from o and returns 0, or -1 with an exception set if there's
some problem.  In numarray's SimpleCArray struct,  the shape and strides
arrays were fully allocated (i.e. Py_LONG_LONG shape[MAXDIM];) so the
struct could be placed in an auto variable with nothing to free() later.

In this interface,  there is no implied getattr at all,  since the
helper function getSimpleCArray() can be made as smart (i.e. given
knowledge about specific types) as people are motivated to make it. 
So,  for a Numeric array or a numarray or a Numeric3 array, 
getSimpleCArray would presumably just copy from struct to struct,  but
for other types,  it might fall back on the many-getattr approach.

Regards,
Todd

> Here's an alternative: Define some attribute (for now, call it
> __array_c__), which returns a CObject whose value (which you get with
> PyCObject_GetVoidPtr) would be a pointer to a struct describing the
> array. It would look something like
> 
> typedef struct {
>     int version;
>     int nd;
>     Py_LONG_LONG *shape;
>     char typecode;
>     Py_LONG_LONG *strides;
>     Py_LONG_LONG offset;
>     void *data;
> } SimpleCArray;
> 
> (The order here follows that of the array interface spec; if somebody's
> got any comments on what mixing int's, Py_LONG_LONG, and char's in a
> struct does to the packing and potential alignment problems I'd like to
> know.)
> 
> version is there as a sanity check: I'd say for this version it's
> something like 0xDECAF ('cause it's lightweight, see ;-). It's primarily
> a check that you've got the right thing (sinc CObjects are
> intrinsically opaque types).
> 
> Then:
> - the array object guarantees that the data, etc. remains alive,
>   probably by passing itself as the desc parameter to the CObject.
>   The array data would have to stay at the same location and the same
>   size while the reference is held.
> 
> - typecode follows that of the __array_typestr__ attribute
> 
> - shape and strides are pointers to arrays of at least nd elements.
> 
> - this doesn't handle byteswapped as-is. Maybe a flags, or endian,
>   attribute could be added.
> 
> - you can still have the full attribute-base array interface
>   (__array_strides__, etc.) to fall back on. If the typecode is 'V',
>   you'll have to look at __array_descr__.
> 
> Creating one from a Numeric PyArrayObject would go like this:
> 
> PyObject *create_SimpleCArray(PyArrayObject *a)
> {
>     SimpleCArray *ca = PyMem_New(SimpleCArray, 1);
>     ca->version = 0xDECAF;
>     ca->nd = a->nd;
>     ca->shape = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->shape[i] = a->dimensions[i];
>     }
>     ca->strides = PyMem_New(Py_LONG_LONG, ca->nd);
>     for (i = 0; i < ca->nd; i++) {
>         ca->strides[i] = a->strides[i];
>     }
>     ca->offset = 0;
>     ca->data = &my_data;
> 
>     Py_INCREF(a);
>     PyObject *co = PyCObject_FromVoidPtrAndDesc(ca, a, free_numeric_simplecarray);
>     return co;
> }
> 
> where
> void free_numeric_simplecarray(SimpleCArray *ca, PyArrayObject *a)
> {
>     PyMem_Free(ca->shape);
>     PyMem_Free(ca->strides);
>     PyMem_Free(ca);
>     Py_DECREF(a);
> }
> 
> Some points:
> - you have to keep the CObject around: destroying it will potentially
>   destroy the array you're looking at.
> - I was thinking that maybe adding a PyObject *owner could make it
>   easier to keep track of the owner; I'm not sure, as the descr argument
>   in CObjects can easily play that role.
> - The creator of the SimpleCArray is free to add elements to the end
>   (as long as they don't affect the padding/alignment of the previous
>   ones: haven't thought about this). You could put the real owner of the
>   array data there, for example (say, if it was wrapping a Blitz++
>   array). Or have a small _strides[30] array at the end, and strides
>   would point to that (saving you a memory allocation).
> 
> This simple C interface would, I think, alleviate much worries about
> speed for small arrays, and even for large arrays.
-- 





More information about the Numpy-discussion mailing list