[Numpy-discussion] How to debug reference counting errors
Fri Aug 31 20:05:32 CDT 2012
On Fri, Aug 31, 2012 at 5:56 PM, Mark Wiebe <firstname.lastname@example.org> wrote:
> On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík <email@example.com>
>> Hi Dag,
>> On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
>> <firstname.lastname@example.org> wrote:
>> > On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
>> >> Hi,
>> >> There is segfault reported here:
>> >> http://projects.scipy.org/numpy/ticket/1588
>> >> I've managed to isolate the problem and even provide a simple patch,
>> >> that fixes it here:
>> >> https://github.com/numpy/numpy/issues/398
>> >> however the patch simply doesn't decrease the proper reference, so it
>> >> might leak. I've used
>> >> bisection (took the whole evening unfortunately...) but the good news
>> >> is that I've isolated commits
>> >> that actually broke it. See the github issue #398 for details, diffs
>> >> etc.
>> >> Unfortunately, it's 12 commits from Mark and the individual commits
>> >> raise exception on the segfaulting code,
>> >> so I can't pin point the problem further.
>> >> In general, how can I debug this sort of problem? I tried to use
>> >> valgrind, with a debugging build of numpy,
>> >> but it provides tons of false (?) positives:
>> >> https://gist.github.com/3549063
>> >> Mark, by looking at the changes that broke it, as well as at my "fix",
>> >> do you see where the problem could be?
>> >> I suspect it is something with the changes in PyArray_FromAny() or
>> >> PyArray_FromArray() in ctors.c.
>> >> But I don't see anything so far that could cause it.
>> >> Thanks for any help. This is one of the issues blocking the 1.7.0
>> >> release.
>> > IIRC you can recompile Python with some support for detecting memory
>> > leaks. One of the issues with using Valgrind, after suppressing the
>> > false positives, is that Python uses its own memory allocator so that
>> > sits between the bug and what Valgrind detects. So at least recompile
>> > Python to not do that.
>> Right. Compiling with "--without-pymalloc" (per README.valgrind as
>> above by Richard) should improve things a lot. Thanks for the tip.
>> > As for hardening the NumPy source in general, you should at least be
>> > aware of these two options:
>> > 1) David Malcolm (email@example.com) was writing a static code
>> > analysis plugin for gcc that would check every routine that the
>> > reference count semantics was correct. (I don't know how far he's got
>> > with that.)
>> > 2) In Cython we have a "reference count nanny". This requires changes to
>> > all the code though, so not an option just for finding this bug, just
>> > thought I'd mention it. In addition to the INCREF/DECREF you need to
>> > insert new "GIVEREF" and "GOTREF" calls (which are noops in a normal
>> > compile) to declare where you get and give away a reference. When
>> > Cython-generated sources are enabled with -DCYTHON_REFNANNY,
>> > INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
>> > failure is raised if the function violates any contract.
>> I see. That's a nice option. For my own code, I never touch the
>> reference counting
>> by hand and rather just use Cython.
>> In the meantime, Mark fixed it:
>> Mark, thanks again for this. That saved me a lot of time.
> No problem. The way I prefer to deal with this kind of error is use C++
> smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both useful
> for painlessly managing this kind of reference counting headache.
Oh yes. I prefer to use Trilinos' RCP, which is a shared pointer (just
like in C++11), but has better debugging info if something goes wrong.
It can be compiled in two modes -- one is slower and it can't
segfault, and the other is optimized, most operations are at native
raw pointer speed, but it can segfault.
More information about the NumPy-Discussion