[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT).

Travis Oliphant oliphant.travis at ieee.org
Tue Jun 20 04:24:34 CDT 2006


Matthieu Perrot wrote:
> hi,
>
> I need to handle strings shaped by a numpy array whose data own to a C
> structure. There is several possible answers to this problem :
>   1) use a numpy array of strings (PyArray_STRING) and so a (char *) object
>   in C. It works as is, but you need to define a maximum size to your strings
>   because your set of strings is contiguous in memory.
>   2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C string»
>   with a python object, using PyStringObject for example. Then our problem is
>   that there is as wrapper as data element and I believe data can't be shared
>   when your created PyStringObject using (char *) thanks to
>   PyString_AsStringAndSize by example.
>
>
> Now, I will expose a third way, which allow you to use no size-limited strings
> (as in solution 1.) and don't create wrappers before you really need it
> (on demand/access).
>
> First, for convenience, we will use in C, (char **) type to build an array of
> string pointers (as it was suggested in solution 2). Now, the game is to
> make it works with numpy API, and use it in python through a python array.
> Basically, I want a very similar behabiour than arrays of PyObject, where
> data are not contiguous, only their address are. So, the idea is to create
> a new array descr based on PyArray_OBJECT and change its getitem/setitem
> functions to deals with my own data.
>
> I exepected numpy to work with this convenient array descr, but it fails
> because PyArray_Scalar (arrayobject.c) don't call descriptor getitem function
> (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from
> the OBJECT_getitem function). Here my small patch is :
> replace (arrayobject.c:983-984):
>           Py_INCREF(*((PyObject **)data));
>           return *((PyObject **)data);
> by :
>           return descr->f->getitem(data, base);
>
> I play a lot with my new numpy array after this change and noticed that a lot
> of uses works :
>   
This is an interesting solution.  I was not considering it, though, and 
so I'm not surprised you have problems.  You can register new types but 
basing them off of PyArray_OBJECT can be problematic because of the 
special-casing that is done in several places to manage reference counting.

You are supposed to register your own data-types and get your own 
typenumber.  Then you can define all the functions for the entries as 
you wish.  

Riding on the back of PyArray_OBJECT may work if you are clever, but it 
may fail mysteriously as well because of a reference count snafu.

Thanks for the tests and bug-reports.  I have no problem changing the 
code as you suggest.

-Travis





More information about the Numpy-discussion mailing list