[Numpy-discussion] tiny patch + Playing with strings and my own array descr (PyArray_STRING, PyArray_OBJECT).

Matthieu Perrot perrot at shfj.cea.fr
Fri Jun 16 13:01:31 CDT 2006


I need to handle strings shaped by a numpy array whose data own to a C
structure. There is several possible answers to this problem :
  1) use a numpy array of strings (PyArray_STRING) and so a (char *) object
  in C. It works as is, but you need to define a maximum size to your strings
  because your set of strings is contiguous in memory.
  2) use a numpy array of objects (PyArray_OBJECT), and wrap each «C string»
  with a python object, using PyStringObject for example. Then our problem is
  that there is as wrapper as data element and I believe data can't be shared
  when your created PyStringObject using (char *) thanks to
  PyString_AsStringAndSize by example.

Now, I will expose a third way, which allow you to use no size-limited strings
(as in solution 1.) and don't create wrappers before you really need it
(on demand/access).

First, for convenience, we will use in C, (char **) type to build an array of
string pointers (as it was suggested in solution 2). Now, the game is to
make it works with numpy API, and use it in python through a python array.
Basically, I want a very similar behabiour than arrays of PyObject, where
data are not contiguous, only their address are. So, the idea is to create
a new array descr based on PyArray_OBJECT and change its getitem/setitem
functions to deals with my own data.

I exepected numpy to work with this convenient array descr, but it fails
because PyArray_Scalar (arrayobject.c) don't call descriptor getitem function
(in PyArray_OBJECT case) but call 2 lines which have been copy/paste from
the OBJECT_getitem function). Here my small patch is :
replace (arrayobject.c:983-984):
          Py_INCREF(*((PyObject **)data));
          return *((PyObject **)data);
by :
          return descr->f->getitem(data, base);

I play a lot with my new numpy array after this change and noticed that a lot
of uses works :
>>> a = myArray()
array([["plop", "blups"]], dtype=object)
>>> print a
[["plop", "blups"]]
>>> a[0, 0] = "youpiiii"
>>> print a
[["youpiiii", "blups"]]
s = a[0, 0]
>>> print s
>>> b = a[:] #data was shared with 'a' (similar behaviour than array of 
>>> >>> numpy.zeros(1, dtype = a.dtype) 
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: fields with object members not yet supported.
>>> numpy.array(a)
segmentation fault

Finally, I found a forgotten check in multiarraymodule.c (_array_fromobject
function), after label finish (line 4661), add :
        if (!ret) {
                return Py_None;

After this change, I obtained (when I was not in interactive mode) :
# numpy.array(a)
Exception exceptions.TypeError: 'fields with object members not yet 
supported.' in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection

But strangely, when I was in interactive mode, one time it fails and raise an
exception (good behaviour), and the next time it only returns None.
>>> numpy.array(myArray())
TypeError: fields with object members not yet supported.
>>> a=numpy.array(myArray()); print a

A bug remains (I will explore it later), but it is better than before.

This mail, show how to map (char **) on a numpy array, but it's easy to use
the same idea to handle any types (your_object **).

I'll be pleased to discuss on any comments on the proposed solution or any
others you can find.

Matthieu Perrot             Tel: +33 1 69 86 78 21
CEA - SHFJ                  Fax: +33 1 69 86 77 86
4, place du General Leclerc
91401 Orsay Cedex France

More information about the Numpy-discussion mailing list