[SciPy-user] kmeans2 bug for 1D data?

Lin Shao shao@msg.ucsf....
Wed Apr 2 13:00:41 CDT 2008

I understand that kmeans2 is usually used for multidimensional vector
space, but it is sometimes useful for 1D clustering, such as
clustering the pixels of an image based solely on pixel intensities.
And kmeans2 does theoretically support 1D, as stated in its function

def kmeans2(data, k, iter = 10, thresh = 1e-5, minit = 'random',
missing = 'warn'):
        data : ndarray
            Expect a rank 1 or 2 array. Rank 1 are assumed to describe one
            dimensional data, rank 2 multidimensional data, in which case one
            row is one observation.

But the truth is if data is 1D and if minit is 'random', there's an
error message when calling vq(data, code) in _kmeans(), because
apparently code is a rank-2 array. The cause is in _krandinit() where
the return value, x, is rank-2 no matter input data is rank 1 or 2. I
"fixed" it by replacing "return x" with "return x.squeeze()". I'm not
sure if that's the right way.

The magic of the microscope is not that it makes little creatures
larger, but that it makes a large one smaller.

More information about the SciPy-user mailing list