[SciPy-Dev] cs_graph_components behavior

Jacob VanderPlas vanderplas@astro.washington....
Fri Mar 16 18:01:02 CDT 2012

I've been working recently on the sparse graph PR [1].  The only 
previously existing sparse graph function in scipy was 
scipy.sparse.cs_graph_components, which counts the connected components 
in a sparse representation of a graph.  There are several issues with 
this function:

  - the input matrix is assumed to be symmetric.  A more general routine 
would be better
  - the function is implemented in mostly uncommented C++ in a way that 
is difficult to read and maintain
  - the doc string claims that only the upper triangle of the matrix is 
used: this is not the case
  - nodes which are not connected to others are given the label -2, and 
are not counted in the total number of components

This interface, especially the final point, does not seem very 
intuitive.  In the equivalent routine in networkx, for example, 
single-node components are counted in the total, and they are numbered 
the same way as components with more than one node [2].

I'd like to change the behavior of this function to that seen in 
networkX, and fix the other issues as well through a cython 
implementation.  This would involve a breakage of backward compatibility 
with this function.  I've already written the replacement, as well as 
extending it to find strongly and weakly connected components for 
directed graphs [1].

Any thoughts on this?  Should backward compatibility be broken in order 
to have a more intuitive function, and a more consistent interface 
across this new submodule?  Is there a reason for the old behavior of 
this function that I'm missing?  One possible solution is to leave the 
old function in-place, but deprecate it and reference the new function 
(and new behavior) in the warning message.

I'd love some input on this - thanks

[1] http://github.com/scipy/scipy/pull/119

More information about the SciPy-Dev mailing list