[Numpy-discussion] Another reference count leak: ticket #848

Travis E. Oliphant oliphant@enthought....
Fri Jul 18 22:07:24 CDT 2008

> Looking at the uses of PyArray_FromAny I can see the motivation for this 
> design: core/include/numpy/ndarrayobject.h has a lot of calls which take a 
> value returned by PyArray_DescrFromType as argument.  This has prompted me 
> to take a trawl through the code to see what else is going on, and I note 
> a couple more issues with patches below.
The core issue is that NumPy grew out of Numeric.  In Numeric 
PyArray_Descr was just a C-structure, but in NumPy it is now a real 
Python object with reference counting.  Trying to have a compatible 
C-API to the old one and making the transition with out huge changes to 
the C-API is what led to the "stealing" strategy. 

I did not just out the blue decide to do it that way.   Yes, it is a bit 
of a pain, and yes, it isn't the way it is always done, but as you point 
out there are precedents, and so that's the direction I took.  It is 
*way* too late to change that in any meaningful way
> In the patch below the problem being fixed is that the first call to 
> PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is 
> called.  Perhaps you can argue that this only occurs when NULL is 
> returned...
Indeed I would argue that because the array object holds a reference to 
the typecode (data-type object).  Only if the call returns NULL does the 
data-type object lose it's reference count, and in fact that works out 
rather nicely and avoids a host of extra Py_DECREFs. 
> The next patch deals with an interestingly subtle memory leak in 
> _string_richcompare where if casting to a common type fails then a 
> reference count will leaked.  Actually this one has nothing to do with 
> PyArray_FromAny, but I spotted it in passing.
This is a good catch.  Thanks!
> I really don't think that this design of reference count handling in 
> PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea.  
Your point is well noted, but again given the provenance of the code, I 
still think it was the best choice.  And, yes, it is too late to change it.
> Not only is this not a good idea, it's not documented in the API 
> documentation (I'm referring to the "Guide to NumPy" book)
Hmmm... Are you sure?  I remember writing a bit about in the paragraphs 
that describe the relevant API calls.  But, you could be right.
> I've been trying to find some documentation on stealing references.  The 
> Python C API reference (http://docs.python.org/api/refcountDetails.html) 
> says
> 	Few functions steal references; the two notable exceptions are 
> 	PyList_SetItem() and PyTuple_SetItem()
> An interesting essay on reference counting is at 
> http://lists.blender.org/pipermail/bf-python/2005-September/003092.html
Believe me, I understand reference counting pretty well.  Still, it can 
be tricky to do correctly and it is easy to forget corner cases and 
error-returns.   I very much appreciate your careful analysis,  but I 
did an analysis of my own when I wrote the code, and so I will be 
resistant to change things if I can't see the error.

I read something from Guido once that said something to the effect that 
nothing beats studying the code to get reference counting right.   I 
think this is true.
> In conclusion, I can't find much about the role of stealing in reference 
> count management, but it's such a source of surprise (and frankly doesn't 
> actually work out all that well in numpy)
I strongly beg to differ.   This sounds very naive to me.    IMHO it has 
worked out extremely well in converting the PyArray_Descr C-structure 
into the data-type objects that adds so much power to NumPy.  

Yes, there are a few corner cases that you have done an excellent job in 
digging up, but they are "corner" cases that don't cause problems for 
99.9% of the use-cases. 


More information about the Numpy-discussion mailing list