[Scipy-tickets] [SciPy] #866: cluster.kmeans2 has incorrect random init for 2 column data

SciPy scipy-tickets@scipy....
Wed Feb 4 20:49:43 CST 2009


#866: cluster.kmeans2 has incorrect random init for 2 column data
---------------------------+------------------------------------------------
 Reporter:  josefpktd      |       Owner:  somebody
     Type:  defect         |      Status:  new     
 Priority:  normal         |   Milestone:  0.8.0   
Component:  scipy.cluster  |     Version:  devel   
 Severity:  normal         |    Keywords:          
---------------------------+------------------------------------------------
 cluster.vq._krandinit does not create a bivariate random sample with the
 desired covariance, because np.cov returns a scalar and not a covariance
 matrix for data with 2 columns (or rows). (I think this was changed not
 very long ago)

 As a consequence the bivariate normal sample has perfect correlation and
 not the one of the data.

 Instead of doing the Cholesky decomposition to generate the multivariate
 normal the numpy function np.random.multivariate_normal could be used (but
 that wouldn't make a difference, I think and does not address the np.cov
 problem).

 example:

 {{{
 bvn = np.random.multivariate_normal([0,0],[[1,0.5],[0.5,1]],500)
 >>> r2d=cluster.vq._krandinit(bvn,500).shape
 >>> np.corrcoef(r2d)
 1

 >>> np.corrcoef(bvn, rowvar=0)
 array([[ 1.       ,  0.5018332],
        [ 0.5018332,  1.       ]])
 }}}

 other cases work correctly, e.g.
 {{{
 >>> r3d=cluster.vq._krandinit(rn3d,500)
 >>> np.corrcoef(r3d, rowvar=0)
 array([[ 1.        ,  0.56225876,  0.90405282],
        [ 0.56225876,  1.        ,  0.52268196],
        [ 0.90405282,  0.52268196,  1.        ]])
 >>> np.corrcoef(rn3d, rowvar=0)
 array([[ 1.        ,  0.51687592,  0.90504051],
        [ 0.51687592,  1.        ,  0.47141087],
        [ 0.90504051,  0.47141087,  1.        ]])

-- 
Ticket URL: <http://scipy.org/scipy/scipy/ticket/866>
SciPy <http://www.scipy.org/>
SciPy is open-source software for mathematics, science, and engineering.


More information about the Scipy-tickets mailing list