[SciPy-user] cluster package

Lev Givon lev at columbia.edu
Mon Jun 12 15:22:47 CDT 2006


Received from Emanuele Olivetti on Mon, Jun 12, 2006 at 11:06:57AM EDT:
>  From a preliminary invstigation this is what I got.
> 
> Pycluster relies on C clustering library. The C clustering library
> uses ranlib.
> 
> According to pycluster website "The C clustering library and Pycluster
> were released under the Python License."[*]. That's good for Scipy.
> 
> Ranlib has a mixed license: ACM restrictive for some functions and
> public domain for the rest of the code. In particular ACM license is
> not good for Scipy.
> 
> Ranlib is called only inside Pycluster's 'cluster.c', and exactly
> here:
> cluster.c:1359:  setall (iseed1, iseed2);
> cluster.c:1399:  genprm (map, nelements);
> cluster.c:1407:    clusterid[map[i]] = ignuin (0,nclusters-1);
> cluster.c:3161:      { double term = genunf(-1.,1.);
> cluster.c:3173:  genprm (index, nelements);
> 
> 
> So there are just 4 functions used from ranlib:
> 1) setall() : initialization of the generator (ACM restrictive
> license, see ranlib's com.c)
> 2) ignuin() : generates an integer uniformly distributed, that uses
> ranlib's ignlgi() that has ACM restrictive license
> 3) genunf() : generates a real uniformly distributed, that uses
> ranlib's ranf() that has ACM restrictive license
> 4) genprm() : generate random permutation, that uses ranlib's ignuin()
> 
> It seems that handling those 5 function calls is enough to separate
> Pycluster from ranlib and ACM restrictive license. Since the 4 ranlib
> functions are just used in Pycluster's cluster.c in 'randomassign'
> function (used in k-means and k-medians hi-level function) and in
> 'somworker' function (called in somcluster hi-level function),
> it semms not that difficult to call another, more friendly, RNG library.
> 
> Which libraries could substitute ranlib for Pycluster? As far as I
> understand there aren't big performace need related to Pycluster's use
> of ranlib.
> 
> Observations/Corrections/Suggestions are welcome!
> 
> Emanuele
> 
> [*]: Note that this is not exactly what it's written inside the source
> package, where there a standard BSD-like license (see cluster.c),
> whose text has more or less the same menaning of the Python license
> but slightly different words (a question: to which version of Python
> do they refer to? There was a non trivial evolution of that license
> during last years...). Anyway we can say that Pycluster sources,
> except ranlib* are BSD-like.
> 

numpy makes use of a free (BSD-like license) C implementation of the
Mersenne Twister called randomkit that can be used to generate integer
and real uniform random numbers. A bit of coding can provide a random
permutation generator that uses the randomkit functions.

The latest version is available here (numpy 0.9.8 uses a slightly
older version):

http://www.jeannot.org/~js/code/randomkit-1.6.tgz

							 L.G.



More information about the SciPy-user mailing list