[SciPy-user] cluster package

Emanuele Olivetti olivetti at itc.it
Mon Jun 12 10:06:57 CDT 2006

 From a preliminary invstigation this is what I got.

Pycluster relies on C clustering library. The C clustering library
uses ranlib.

According to pycluster website "The C clustering library and Pycluster
were released under the Python License."[*]. That's good for Scipy.

Ranlib has a mixed license: ACM restrictive for some functions and
public domain for the rest of the code. In particular ACM license is
not good for Scipy.

Ranlib is called only inside Pycluster's 'cluster.c', and exactly
cluster.c:1359:  setall (iseed1, iseed2);
cluster.c:1399:  genprm (map, nelements);
cluster.c:1407:    clusterid[map[i]] = ignuin (0,nclusters-1);
cluster.c:3161:      { double term = genunf(-1.,1.);
cluster.c:3173:  genprm (index, nelements);

So there are just 4 functions used from ranlib:
1) setall() : initialization of the generator (ACM restrictive
license, see ranlib's com.c)
2) ignuin() : generates an integer uniformly distributed, that uses
ranlib's ignlgi() that has ACM restrictive license
3) genunf() : generates a real uniformly distributed, that uses
ranlib's ranf() that has ACM restrictive license
4) genprm() : generate random permutation, that uses ranlib's ignuin()

It seems that handling those 5 function calls is enough to separate
Pycluster from ranlib and ACM restrictive license. Since the 4 ranlib
functions are just used in Pycluster's cluster.c in 'randomassign'
function (used in k-means and k-medians hi-level function) and in
'somworker' function (called in somcluster hi-level function),
it semms not that difficult to call another, more friendly, RNG library.

Which libraries could substitute ranlib for Pycluster? As far as I
understand there aren't big performace need related to Pycluster's use
of ranlib.

Observations/Corrections/Suggestions are welcome!


[*]: Note that this is not exactly what it's written inside the source
package, where there a standard BSD-like license (see cluster.c),
whose text has more or less the same menaning of the Python license
but slightly different words (a question: to which version of Python
do they refer to? There was a non trivial evolution of that license
during last years...). Anyway we can say that Pycluster sources,
except ranlib* are BSD-like.

Emanuele Olivetti wrote:
 > Robert Kern wrote:
 >>   http://orion.math.iastate.edu/burkardt/c_src/ranlib/ranlib_intro.txt
 >> Why nobody ever reads the RANLIB license is a mystery to me.
 > Thanks a lot for mentioning ranlib's license problems. I completely missed it.
 > I'm investigating into Pycluster to see which parts of ranlib are actually used.
 > As far as I read from ranlib's license some code is public domain and some other
 > is ACM restrictive license (source code seems clear enough to understand which
 > part is one license and which part is the other one).
 > If you have suggestions on this point please let me know.
 > Emanuele
 > _______________________________________________
 > SciPy-user mailing list
 > SciPy-user at scipy.net
 > http://www.scipy.net/mailman/listinfo/scipy-user

More information about the SciPy-user mailing list