[SciPy-user] Mysterious kmeans() error
Roy H. Han
starsareblueandfaraway@gmail....
Fri Feb 6 08:29:11 CST 2009
Thanks, Josef.
It seems that it happens when one of the clusters becomes empty.
Pycluster never seems to have the problem of empty clusters though.
/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477:
UserWarning: One of the clusters is empty. Re-run kmean with a
different initialization.
warnings.warn("One of the clusters is empty. "
Traceback (most recent call last):
File "clusterProbabilities.py", line 88, in <module>
run(taskName, parameterByName)
File "clusterProbabilities.py", line 57, in run
locationGeoFrame = probability_process.cluster(targetLocationPath,
probabilityPath, iterationCountPerBurst, maximumGeoDiameter,
minimumGeoDiameter)
File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
line 33, in cluster
windowLocations = grapeCluster(vectors, iterationCountPerBurst,
maximumPixelDiameter, minimumPixelDiameter)
File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
line 66, in grapeCluster
assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
iter=iterationCountPerBurst)[1]
File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
563, in kmeans2
clusters = init(data, k)
File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
469, in _krandinit
x = N.dot(x, N.linalg.cholesky(cov).T) + mu
File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
line 418, in cholesky
Cholesky decomposition cannot be computed'
numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
Cholesky decomposition cannot be computed
On Fri, Feb 6, 2009 at 9:08 AM, <josef.pktd@gmail.com> wrote:
> On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han
> <starsareblueandfaraway@gmail.com> wrote:
>> Thanks, Josef. This doesn't really answer my question, but thanks for
>> your response.
>>
>>
>> Date: Wed, 4 Feb 2009 12:44:27 -0500
>> From: josef.pktd@gmail.com
>> Subject: Re: [SciPy-user] Mysterious kmeans() error
>> To: SciPy Users List <scipy-user@scipy.org>
>> Message-ID:
>> <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d@mail.gmail.com>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han
>> <starsareblueandfaraway@gmail.com> wrote:
>>> As a side comment, if I use Pycluster, then the clustering proceeds
>>> without error.
>>>
>>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han
>>> <starsareblueandfaraway@gmail.com> wrote:
>>>> Has anyone seen this error before? I have no idea what it means. I'm
>>>> using version 0.6.0 packaged for Fedora.
>>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq
>>>>
>>>>
>>>> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
>>>> line 55, in grapeCluster
>>>> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
>>>> iter=iterationCountPerBurst)[1]
>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>> 563, in kmeans2
>>>> clusters = init(data, k)
>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>> 469, in _krandinit
>>>> x = N.dot(x, N.linalg.cholesky(cov).T) + mu
>>>> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
>>>> line 418, in cholesky
>>>> Cholesky decomposition cannot be computed'
>>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
>>>> Cholesky decomposition cannot be computed
>>
>> This is just a general answer, I never used scipy.cluster
>>
>> The error message means that the covariance matrix of your
>> np.cov(data) is not positive definite. Check your data, whether there
>> is any linear dependence, eg. look at eigenvalues of np.cov(data).
>>
>> If that's not the source of the error, then a cluster expert is needed.
>>
>> Josef
>>
>
> I had looked a bit more, and I get the same error if the data has more
> columns than rows.
> The assumption in scipy.cluster is that columns represent random
> variables and rows represent
> observations. So, if the matrix is transposed then also the same
> exception is raised as in your case
>
> Josef
>
> BTW: it's better to reply to individual threads than to the Digest,
> since that preserves the subject line and threading.
>
More information about the SciPy-user
mailing list