[SciPy-user] Mysterious kmeans() error

Roy H. Han starsareblueandfaraway@gmail....
Fri Feb 6 08:29:11 CST 2009


Thanks, Josef.

It seems that it happens when one of the clusters becomes empty.
Pycluster never seems to have the problem of empty clusters though.


/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477:
UserWarning: One of the clusters is empty. Re-run kmean with a
different initialization.
  warnings.warn("One of the clusters is empty. "

Traceback (most recent call last):
  File "clusterProbabilities.py", line 88, in <module>
    run(taskName, parameterByName)
  File "clusterProbabilities.py", line 57, in run
    locationGeoFrame = probability_process.cluster(targetLocationPath,
probabilityPath, iterationCountPerBurst, maximumGeoDiameter,
minimumGeoDiameter)
  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
line 33, in cluster
    windowLocations = grapeCluster(vectors, iterationCountPerBurst,
maximumPixelDiameter, minimumPixelDiameter)
  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
line 66, in grapeCluster
    assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
iter=iterationCountPerBurst)[1]
  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
563, in kmeans2
    clusters = init(data, k)
  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
469, in _krandinit
    x = N.dot(x, N.linalg.cholesky(cov).T) + mu
  File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
line 418, in cholesky
    Cholesky decomposition cannot be computed'
numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
    Cholesky decomposition cannot be computed



On Fri, Feb 6, 2009 at 9:08 AM,  <josef.pktd@gmail.com> wrote:
> On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han
> <starsareblueandfaraway@gmail.com> wrote:
>> Thanks, Josef.  This doesn't really answer my question, but thanks for
>> your response.
>>
>>
>> Date: Wed, 4 Feb 2009 12:44:27 -0500
>> From: josef.pktd@gmail.com
>> Subject: Re: [SciPy-user] Mysterious kmeans() error
>> To: SciPy Users List <scipy-user@scipy.org>
>> Message-ID:
>>       <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d@mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han
>> <starsareblueandfaraway@gmail.com> wrote:
>>> As a side comment, if I use Pycluster, then the clustering proceeds
>>> without error.
>>>
>>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han
>>> <starsareblueandfaraway@gmail.com> wrote:
>>>> Has anyone seen this error before?  I have no idea what it means.  I'm
>>>> using version 0.6.0 packaged for Fedora.
>>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq
>>>>
>>>>
>>>>  File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
>>>> line 55, in grapeCluster
>>>>    assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
>>>> iter=iterationCountPerBurst)[1]
>>>>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>> 563, in kmeans2
>>>>    clusters = init(data, k)
>>>>  File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>> 469, in _krandinit
>>>>    x = N.dot(x, N.linalg.cholesky(cov).T) + mu
>>>>  File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
>>>> line 418, in cholesky
>>>>    Cholesky decomposition cannot be computed'
>>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
>>>>    Cholesky decomposition cannot be computed
>>
>> This is just a general answer, I never used scipy.cluster
>>
>> The error message means that the covariance matrix of your
>> np.cov(data)  is not positive definite. Check your data, whether there
>> is any linear dependence, eg. look at eigenvalues of np.cov(data).
>>
>> If that's not the source of the error, then a cluster expert is needed.
>>
>> Josef
>>
>
> I had looked a bit more, and I get the same error if the data has more
> columns than rows.
> The assumption in scipy.cluster is that columns represent random
> variables and rows represent
> observations. So, if the matrix is transposed then also the same
> exception is raised as in your case
>
> Josef
>
> BTW: it's better to reply to individual threads than to the Digest,
> since that preserves the subject line and threading.
>


More information about the SciPy-user mailing list