[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT).
oliphant.travis at ieee.org
Tue Jun 20 04:24:34 CDT 2006
Matthieu Perrot wrote:
> I need to handle strings shaped by a numpy array whose data own to a C
> structure. There is several possible answers to this problem :
> 1) use a numpy array of strings (PyArray_STRING) and so a (char *) object
> in C. It works as is, but you need to define a maximum size to your strings
> because your set of strings is contiguous in memory.
> 2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C string»
> with a python object, using PyStringObject for example. Then our problem is
> that there is as wrapper as data element and I believe data can't be shared
> when your created PyStringObject using (char *) thanks to
> PyString_AsStringAndSize by example.
> Now, I will expose a third way, which allow you to use no size-limited strings
> (as in solution 1.) and don't create wrappers before you really need it
> (on demand/access).
> First, for convenience, we will use in C, (char **) type to build an array of
> string pointers (as it was suggested in solution 2). Now, the game is to
> make it works with numpy API, and use it in python through a python array.
> Basically, I want a very similar behabiour than arrays of PyObject, where
> data are not contiguous, only their address are. So, the idea is to create
> a new array descr based on PyArray_OBJECT and change its getitem/setitem
> functions to deals with my own data.
> I exepected numpy to work with this convenient array descr, but it fails
> because PyArray_Scalar (arrayobject.c) don't call descriptor getitem function
> (in PyArray_OBJECT case) but call 2 lines which have been copy/paste from
> the OBJECT_getitem function). Here my small patch is :
> replace (arrayobject.c:983-984):
> Py_INCREF(*((PyObject **)data));
> return *((PyObject **)data);
> by :
> return descr->f->getitem(data, base);
> I play a lot with my new numpy array after this change and noticed that a lot
> of uses works :
This is an interesting solution. I was not considering it, though, and
so I'm not surprised you have problems. You can register new types but
basing them off of PyArray_OBJECT can be problematic because of the
special-casing that is done in several places to manage reference counting.
You are supposed to register your own data-types and get your own
typenumber. Then you can define all the functions for the entries as
Riding on the back of PyArray_OBJECT may work if you are clever, but it
may fail mysteriously as well because of a reference count snafu.
Thanks for the tests and bug-reports. I have no problem changing the
code as you suggest.
More information about the Numpy-discussion