[SciPy-user] Mysterious kmeans() error

Roy H. Han starsareblueandfaraway@gmail....
Fri Feb 6 08:37:23 CST 2009


Well I feel like there are numerical problems with scipy's kmeans2(),
at least in the 0.6.0 version of scipy.
I changed the code to try to ensure that no clusters were empty.
Pycluster seems to be the better clustering algorithm for now.

Even though the size (number of columns = 3) of each vector in the
cluster is three, kmeans should still work even if one of the clusters
contained a single vector (number of rows = 1).

This is a bug.


On Fri, Feb 6, 2009 at 9:29 AM, Roy H. Han
<starsareblueandfaraway@gmail.com> wrote:
> Thanks, Josef.
>
> It seems that it happens when one of the clusters becomes empty.
> Pycluster never seems to have the problem of empty clusters though.
>
>
> /usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477:
> UserWarning: One of the clusters is empty. Re-run kmean with a
> different initialization.
>  warnings.warn("One of the clusters is empty. "
>
> Traceback (most recent call last):
>  File "clusterProbabilities.py", line 88, in <module>
>    run(taskName, parameterByName)
>  File "clusterProbabilities.py", line 57, in run
>    locationGeoFrame = probability_process.cluster(targetLocationPath,
> probabilityPath, iterationCountPerBurst, maximumGeoDiameter,
> minimumGeoDiameter)
>  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
> line 33, in cluster
>    windowLocations = grapeCluster(vectors, iterationCountPerBurst,
> maximumPixelDiameter, minimumPixelDiameter)
>  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
> line 66, in grapeCluster
>    assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
> iter=iterationCountPerBurst)[1]
>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
> 563, in kmeans2
>    clusters = init(data, k)
>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
> 469, in _krandinit
>    x = N.dot(x, N.linalg.cholesky(cov).T) + mu
>  File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
> line 418, in cholesky
>    Cholesky decomposition cannot be computed'
> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
>    Cholesky decomposition cannot be computed
>
>
>
> On Fri, Feb 6, 2009 at 9:08 AM,  <josef.pktd@gmail.com> wrote:
>> On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han
>> <starsareblueandfaraway@gmail.com> wrote:
>>> Thanks, Josef.  This doesn't really answer my question, but thanks for
>>> your response.
>>>
>>>
>>> Date: Wed, 4 Feb 2009 12:44:27 -0500
>>> From: josef.pktd@gmail.com
>>> Subject: Re: [SciPy-user] Mysterious kmeans() error
>>> To: SciPy Users List <scipy-user@scipy.org>
>>> Message-ID:
>>>       <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d@mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han
>>> <starsareblueandfaraway@gmail.com> wrote:
>>>> As a side comment, if I use Pycluster, then the clustering proceeds
>>>> without error.
>>>>
>>>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han
>>>> <starsareblueandfaraway@gmail.com> wrote:
>>>>> Has anyone seen this error before?  I have no idea what it means.  I'm
>>>>> using version 0.6.0 packaged for Fedora.
>>>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq
>>>>>
>>>>>
>>>>>  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
>>>>> line 55, in grapeCluster
>>>>>    assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
>>>>> iter=iterationCountPerBurst)[1]
>>>>>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>>> 563, in kmeans2
>>>>>    clusters = init(data, k)
>>>>>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>>> 469, in _krandinit
>>>>>    x = N.dot(x, N.linalg.cholesky(cov).T) + mu
>>>>>  File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
>>>>> line 418, in cholesky
>>>>>    Cholesky decomposition cannot be computed'
>>>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
>>>>>    Cholesky decomposition cannot be computed
>>>
>>> This is just a general answer, I never used scipy.cluster
>>>
>>> The error message means that the covariance matrix of your
>>> np.cov(data)  is not positive definite. Check your data, whether there
>>> is any linear dependence, eg. look at eigenvalues of np.cov(data).
>>>
>>> If that's not the source of the error, then a cluster expert is needed.
>>>
>>> Josef
>>>
>>
>> I had looked a bit more, and I get the same error if the data has more
>> columns than rows.
>> The assumption in scipy.cluster is that columns represent random
>> variables and rows represent
>> observations. So, if the matrix is transposed then also the same
>> exception is raised as in your case
>>
>> Josef
>>
>> BTW: it's better to reply to individual threads than to the Digest,
>> since that preserves the subject line and threading.
>>
>


More information about the SciPy-user mailing list