#1246: kmeans gives unexpected results for integer data

Fri Jul 23 15:32:53 CDT 2010

#1246: kmeans gives unexpected results for integer data
 Reporter:  lutz
     Type:  defect         |      Status:  new     
 Priority:  low            |   Milestone:  0.9.0   
Component:  scipy.cluster  |     Version:  0.7.0   
 Keywords:  kmeans         |  
 The current implementation of the kmeans clustering algorithm uses integer
 arithmetic when supplied with integer data, which may give unexpected

 >>> import numpy as np
 >>> from  scipy.cluster.vq import kmeans
 >>> kmeans(np.array([1,2]), 1)
 (array([1]), 0.5)
 >>> kmeans(np.array([1.,2.]), 1)
 (array([ 1.5]), 0.5)

 Other functions that require the calculation of means automatically upcast
 to a floating point data type, for example:

 >>> np.mean(np.array([1,2]))

 Possible solutions include:
 - automatic type conversion to a floating point data type
 - a warning in the docstring

Ticket URL: <http://projects.scipy.org/scipy/ticket/1246>
SciPy <http://www.scipy.org>
SciPy is open-source software for mathematics, science, and engineering.

