[Numpy-discussion] Another reference count leak: ticket #848
Travis E. Oliphant
oliphant@enthought....
Fri Jul 18 22:07:24 CDT 2008
> Looking at the uses of PyArray_FromAny I can see the motivation for this
> design: core/include/numpy/ndarrayobject.h has a lot of calls which take a
> value returned by PyArray_DescrFromType as argument. This has prompted me
> to take a trawl through the code to see what else is going on, and I note
> a couple more issues with patches below.
>
The core issue is that NumPy grew out of Numeric. In Numeric
PyArray_Descr was just a C-structure, but in NumPy it is now a real
Python object with reference counting. Trying to have a compatible
C-API to the old one and making the transition with out huge changes to
the C-API is what led to the "stealing" strategy.
I did not just out the blue decide to do it that way. Yes, it is a bit
of a pain, and yes, it isn't the way it is always done, but as you point
out there are precedents, and so that's the direction I took. It is
*way* too late to change that in any meaningful way
> In the patch below the problem being fixed is that the first call to
> PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is
> called. Perhaps you can argue that this only occurs when NULL is
> returned...
>
Indeed I would argue that because the array object holds a reference to
the typecode (data-type object). Only if the call returns NULL does the
data-type object lose it's reference count, and in fact that works out
rather nicely and avoids a host of extra Py_DECREFs.
> The next patch deals with an interestingly subtle memory leak in
> _string_richcompare where if casting to a common type fails then a
> reference count will leaked. Actually this one has nothing to do with
> PyArray_FromAny, but I spotted it in passing.
>
This is a good catch. Thanks!
> I really don't think that this design of reference count handling in
> PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea.
>
Your point is well noted, but again given the provenance of the code, I
still think it was the best choice. And, yes, it is too late to change it.
> Not only is this not a good idea, it's not documented in the API
> documentation (I'm referring to the "Guide to NumPy" book)
Hmmm... Are you sure? I remember writing a bit about in the paragraphs
that describe the relevant API calls. But, you could be right.
> I've been trying to find some documentation on stealing references. The
> Python C API reference (http://docs.python.org/api/refcountDetails.html)
> says
>
> Few functions steal references; the two notable exceptions are
> PyList_SetItem() and PyTuple_SetItem()
>
> An interesting essay on reference counting is at
> http://lists.blender.org/pipermail/bf-python/2005-September/003092.html
>
Believe me, I understand reference counting pretty well. Still, it can
be tricky to do correctly and it is easy to forget corner cases and
error-returns. I very much appreciate your careful analysis, but I
did an analysis of my own when I wrote the code, and so I will be
resistant to change things if I can't see the error.
I read something from Guido once that said something to the effect that
nothing beats studying the code to get reference counting right. I
think this is true.
> In conclusion, I can't find much about the role of stealing in reference
> count management, but it's such a source of surprise (and frankly doesn't
> actually work out all that well in numpy)
I strongly beg to differ. This sounds very naive to me. IMHO it has
worked out extremely well in converting the PyArray_Descr C-structure
into the data-type objects that adds so much power to NumPy.
Yes, there are a few corner cases that you have done an excellent job in
digging up, but they are "corner" cases that don't cause problems for
99.9% of the use-cases.
-Travis
More information about the Numpy-discussion
mailing list