[SciPy-Dev] cs_graph_components behavior
Jacob VanderPlas
vanderplas@astro.washington....
Fri Mar 16 18:01:02 CDT 2012
Hi,
I've been working recently on the sparse graph PR [1]. The only
previously existing sparse graph function in scipy was
scipy.sparse.cs_graph_components, which counts the connected components
in a sparse representation of a graph. There are several issues with
this function:
- the input matrix is assumed to be symmetric. A more general routine
would be better
- the function is implemented in mostly uncommented C++ in a way that
is difficult to read and maintain
- the doc string claims that only the upper triangle of the matrix is
used: this is not the case
- nodes which are not connected to others are given the label -2, and
are not counted in the total number of components
This interface, especially the final point, does not seem very
intuitive. In the equivalent routine in networkx, for example,
single-node components are counted in the total, and they are numbered
the same way as components with more than one node [2].
I'd like to change the behavior of this function to that seen in
networkX, and fix the other issues as well through a cython
implementation. This would involve a breakage of backward compatibility
with this function. I've already written the replacement, as well as
extending it to find strongly and weakly connected components for
directed graphs [1].
Any thoughts on this? Should backward compatibility be broken in order
to have a more intuitive function, and a more consistent interface
across this new submodule? Is there a reason for the old behavior of
this function that I'm missing? One possible solution is to leave the
old function in-place, but deprecate it and reference the new function
(and new behavior) in the warning message.
I'd love some input on this - thanks
Jake
[1] http://github.com/scipy/scipy/pull/119
[2]
http://networkx.lanl.gov/_modules/networkx/algorithms/components/connected.html
More information about the SciPy-Dev
mailing list