[Numpy-discussion] How to debug reference counting errors

Dag Sverre Seljebotn d.s.seljebotn@astro.uio...
Fri Aug 31 06:22:27 CDT 2012


On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
> Hi,
>
> There is segfault reported here:
>
> http://projects.scipy.org/numpy/ticket/1588
>
> I've managed to isolate the problem and even provide a simple patch,
> that fixes it here:
>
> https://github.com/numpy/numpy/issues/398
>
> however the patch simply doesn't decrease the proper reference, so it
> might leak. I've used
> bisection (took the whole evening unfortunately...) but the good news
> is that I've isolated commits
> that actually broke it. See the github issue #398 for details, diffs etc.
>
> Unfortunately, it's 12 commits from Mark and the individual commits
> raise exception on the segfaulting code,
> so I can't pin point the problem further.
>
> In general, how can I debug this sort of problem? I tried to use
> valgrind, with a debugging build of numpy,
> but it provides tons of false (?) positives: https://gist.github.com/3549063
>
> Mark, by looking at the changes that broke it, as well as at my "fix",
> do you see where the problem could be?
>
> I suspect it is something with the changes in PyArray_FromAny() or
> PyArray_FromArray() in ctors.c.
> But I don't see anything so far that could cause it.
>
> Thanks for any help. This is one of the issues blocking the 1.7.0 release.

IIRC you can recompile Python with some support for detecting memory 
leaks. One of the issues with using Valgrind, after suppressing the 
false positives, is that Python uses its own memory allocator so that 
sits between the bug and what Valgrind detects. So at least recompile 
Python to not do that.

As for hardening the NumPy source in general, you should at least be 
aware of these two options:

1) David Malcolm (dmalcolm@redhat.com) was writing a static code 
analysis plugin for gcc that would check every routine that the 
reference count semantics was correct. (I don't know how far he's got 
with that.)

2) In Cython we have a "reference count nanny". This requires changes to 
all the code though, so not an option just for finding this bug, just 
thought I'd mention it. In addition to the INCREF/DECREF you need to 
insert new "GIVEREF" and "GOTREF" calls (which are noops in a normal 
compile) to declare where you get and give away a reference. When 
Cython-generated sources are enabled with -DCYTHON_REFNANNY, 
INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a 
failure is raised if the function violates any contract.

Dag


More information about the NumPy-Discussion mailing list