[Scipy-tickets] [SciPy] #1861: _connected_components_directed segfaults on large graphs
SciPy Trac
scipy-tickets@scipy....
Thu Mar 7 22:27:09 CST 2013
#1861: _connected_components_directed segfaults on large graphs
----------------------------------+-----------------------------------------
Reporter: timl2 | Owner: jakevdp
Type: defect | Status: new
Priority: normal | Milestone: 0.13.0
Component: scipy.sparse.csgraph | Version: devel
Keywords: |
----------------------------------+-----------------------------------------
In
https://github.com/scipy/scipy/blob/master/scipy/sparse/csgraph/_traversal.pyx
the function {{{_connected_components_directed}}} uses a recursive
algorithm ({{{_cc_visit}}}).
Using this algorithm on graphs with more than ~13700 nodes causes a
segfault due to running out of room for the call stack (verified with
gdb).
The algorithm can be implemented as an iterative algorithm which allows it
to handle graphs of up to ~100 million nodes.
The current implementation is based on the following paper:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1707
which is novel due to using a minimal amount of memory. This result
depends on using the iterative version of the algorithm. The recursive
version is more memory intense, even when optimally implemented.
Because the code is written in cython, the generated C code is nowhere
near optimal, resulting in a massive call stack, which makes the recursive
solution even worse.
I have implemented and tested the iterative algorithm and will submit a
pull request.
--
Ticket URL: <http://projects.scipy.org/scipy/ticket/1861>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.
More information about the Scipy-tickets
mailing list