[Scipy-tickets] [SciPy] #1861: _connected_components_directed segfaults on large graphs

SciPy Trac scipy-tickets@scipy....
Thu Mar 7 22:27:09 CST 2013


#1861: _connected_components_directed segfaults on large graphs
----------------------------------+-----------------------------------------
 Reporter:  timl2                 |       Owner:  jakevdp
     Type:  defect                |      Status:  new    
 Priority:  normal                |   Milestone:  0.13.0 
Component:  scipy.sparse.csgraph  |     Version:  devel  
 Keywords:                        |  
----------------------------------+-----------------------------------------
 In
 https://github.com/scipy/scipy/blob/master/scipy/sparse/csgraph/_traversal.pyx
 the function {{{_connected_components_directed}}} uses a recursive
 algorithm ({{{_cc_visit}}}).

 Using this algorithm on graphs with more than ~13700 nodes causes a
 segfault due to running out of room for the call stack (verified with
 gdb).

 The algorithm can be implemented as an iterative algorithm which allows it
 to handle graphs of up to ~100 million nodes.

 The current implementation is based on the following paper:

 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1707

 which is novel due to using a minimal amount of memory. This result
 depends on using the iterative version of the algorithm. The recursive
 version is more memory intense, even when optimally implemented.

 Because the code is written in cython, the generated C code is nowhere
 near optimal, resulting in a massive call stack, which makes the recursive
 solution even worse.

 I have implemented and tested the iterative algorithm and will submit a
 pull request.

-- 
Ticket URL: <http://projects.scipy.org/scipy/ticket/1861>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list