[Numpy-discussion] Another reference count leak: ticket #848

Michael Abbott michael@araneidae.co...
Wed Jul 9 03:36:27 CDT 2008


There are three separate patches in this message plus some remarks on 
"stealing" reference counts at the bottom.


On Tue, 8 Jul 2008, Travis E. Oliphant wrote:

> Michael Abbott wrote:
> > On Tue, 8 Jul 2008, Travis E. Oliphant wrote:
> >> The first part of this patch is good.  The second is not needed.   
> > I don't see that.
> Don't forget that PyArray_FromAny consumes the reference even if it 
> returns with an error.

Oh dear.  That's not good.  

Well then, I need to redo my patch.  Here's the new patch for 
..._arrtype_new:


commit 431d99f40ca200201ba59c74a88b0bd972022ff0
Author: Michael Abbott <michael.abbott@diamond.ac.uk>
Date:   Tue Jul 8 10:10:59 2008 +0100

    Another reference leak using PyArray_DescrFromType
    
    This change fixes two issues: a spurious ADDREF on a typecode returned
    from PyArray_DescrFromType and an awkward interaction with PyArray_FromAny.

diff --git a/numpy/core/src/scalartypes.inc.src b/numpy/core/src/scalartypes.inc.src
index 3feefc0..7d3e562 100644
--- a/numpy/core/src/scalartypes.inc.src
+++ b/numpy/core/src/scalartypes.inc.src
@@ -1886,7 +1886,6 @@ static PyObject *
         if (!PyArg_ParseTuple(args, "|O", &obj)) return NULL;
 
     typecode = PyArray_DescrFromType(PyArray_@TYPE@);
-    Py_INCREF(typecode);
     if (obj == NULL) {
 #if @default@ == 0
         char *mem;
@@ -1903,8 +1902,12 @@ static PyObject *
         goto finish;
     }
 
+    Py_XINCREF(typecode);
     arr = PyArray_FromAny(obj, typecode, 0, 0, FORCECAST, NULL);
-    if ((arr==NULL) || (PyArray_NDIM(arr) > 0)) return arr;
+    if ((arr==NULL) || (PyArray_NDIM(arr) > 0)) {
+        Py_XDECREF(typecode);
+        return arr;
+    }
     robj = PyArray_Return((PyArrayObject *)arr);
 
 finish:


I don't think we can dispense with the extra INCREF and DECREF.


Looking at the uses of PyArray_FromAny I can see the motivation for this 
design: core/include/numpy/ndarrayobject.h has a lot of calls which take a 
value returned by PyArray_DescrFromType as argument.  This has prompted me 
to take a trawl through the code to see what else is going on, and I note 
a couple more issues with patches below.


In the patch below the problem being fixed is that the first call to 
PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is 
called.  Perhaps you can argue that this only occurs when NULL is 
returned...


diff --git a/numpy/core/blasdot/_dotblas.c b/numpy/core/blasdot/_dotblas.c
index e2619b6..0b34ec7 100644
--- a/numpy/core/blasdot/_dotblas.c
+++ b/numpy/core/blasdot/_dotblas.c
@@ -234,9 +234,9 @@ dotblas_matrixproduct(PyObject *dummy, PyObject *args)
     }
 
     dtype = PyArray_DescrFromType(typenum);
-    ap1 = (PyArrayObject *)PyArray_FromAny(op1, dtype, 0, 0, ALIGNED, NULL);
-    if (ap1 == NULL) return NULL;
     Py_INCREF(dtype);
+    ap1 = (PyArrayObject *)PyArray_FromAny(op1, dtype, 0, 0, ALIGNED, NULL);
+    if (ap1 == NULL) { Py_DECREF(dtype); return NULL; }
     ap2 = (PyArrayObject *)PyArray_FromAny(op2, dtype, 0, 0, ALIGNED, NULL);
     if (ap2 == NULL) goto fail;
 

The next patch deals with an interestingly subtle memory leak in 
_string_richcompare where if casting to a common type fails then a 
reference count will leaked.  Actually this one has nothing to do with 
PyArray_FromAny, but I spotted it in passing.


diff --git a/numpy/core/src/arrayobject.c b/numpy/core/src/arrayobject.c
index ee4e945..2294b8d 100644
--- a/numpy/core/src/arrayobject.c
+++ b/numpy/core/src/arrayobject.c
@@ -4715,7 +4715,6 @@ _strings_richcompare(PyArrayObject *self, PyArrayObject *other, int cmp_op,
         PyObject *new;
         if (self->descr->type_num == PyArray_STRING && \
             other->descr->type_num == PyArray_UNICODE) {
-            Py_INCREF(other);
             Py_INCREF(other->descr);
             new = PyArray_FromAny((PyObject *)self, other->descr,
                                   0, 0, 0, NULL);
@@ -4723,16 +4722,17 @@ _strings_richcompare(PyArrayObject *self, PyArrayObject *other, int cmp_op,
                 return NULL;
             }
             self = (PyArrayObject *)new;
+            Py_INCREF(other);
         }
         else if (self->descr->type_num == PyArray_UNICODE &&    \
                  other->descr->type_num == PyArray_STRING) {
-            Py_INCREF(self);
             Py_INCREF(self->descr);
             new = PyArray_FromAny((PyObject *)other, self->descr,
                                   0, 0, 0, NULL);
             if (new == NULL) {
                 return NULL;
             }
+            Py_INCREF(self);
             other = (PyArrayObject *)new;
         }
         else {


I really don't think that this design of reference count handling in 
PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea.  
Unfortunately these seem to be part of the published API, so presumably 
it's too late to change this?  (Otherwise I might see how the 
corresponding patch comes out.)

Not only is this not a good idea, it's not documented in the API 
documentation (I'm referring to the "Guide to NumPy" book), although at 
least the inline comment on the implementation of PyArray_FromAny does 
mention that the reference is "stolen".

I've been trying to find some documentation on stealing references.  The 
Python C API reference (http://docs.python.org/api/refcountDetails.html) 
says

	Few functions steal references; the two notable exceptions are 
	PyList_SetItem() and PyTuple_SetItem()

An interesting essay on reference counting is at 
http://lists.blender.org/pipermail/bf-python/2005-September/003092.html


In conclusion, I can't find much about the role of stealing in reference 
count management, but it's such a source of surprise (and frankly doesn't 
actually work out all that well in numpy) that I don't think it's 
justified.  If PyList_SetItem() and PyTuple_SetItem() could remain the 
only examples of this it would be good.


More information about the Numpy-discussion mailing list