[SciPy-user] Mysterious kmeans() error
Roy H. Han
starsareblueandfaraway@gmail....
Fri Feb 6 08:37:23 CST 2009
Well I feel like there are numerical problems with scipy's kmeans2(),
at least in the 0.6.0 version of scipy.
I changed the code to try to ensure that no clusters were empty.
Pycluster seems to be the better clustering algorithm for now.
Even though the size (number of columns = 3) of each vector in the
cluster is three, kmeans should still work even if one of the clusters
contained a single vector (number of rows = 1).
This is a bug.
On Fri, Feb 6, 2009 at 9:29 AM, Roy H. Han
<starsareblueandfaraway@gmail.com> wrote:
> Thanks, Josef.
>
> It seems that it happens when one of the clusters becomes empty.
> Pycluster never seems to have the problem of empty clusters though.
>
>
> /usr/lib64/python2.5/site-packages/scipy/cluster/vq.py:477:
> UserWarning: One of the clusters is empty. Re-run kmean with a
> different initialization.
> warnings.warn("One of the clusters is empty. "
>
> Traceback (most recent call last):
> File "clusterProbabilities.py", line 88, in <module>
> run(taskName, parameterByName)
> File "clusterProbabilities.py", line 57, in run
> locationGeoFrame = probability_process.cluster(targetLocationPath,
> probabilityPath, iterationCountPerBurst, maximumGeoDiameter,
> minimumGeoDiameter)
> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
> line 33, in cluster
> windowLocations = grapeCluster(vectors, iterationCountPerBurst,
> maximumPixelDiameter, minimumPixelDiameter)
> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
> line 66, in grapeCluster
> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
> iter=iterationCountPerBurst)[1]
> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
> 563, in kmeans2
> clusters = init(data, k)
> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
> 469, in _krandinit
> x = N.dot(x, N.linalg.cholesky(cov).T) + mu
> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
> line 418, in cholesky
> Cholesky decomposition cannot be computed'
> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
> Cholesky decomposition cannot be computed
>
>
>
> On Fri, Feb 6, 2009 at 9:08 AM, <josef.pktd@gmail.com> wrote:
>> On Fri, Feb 6, 2009 at 8:42 AM, Roy H. Han
>> <starsareblueandfaraway@gmail.com> wrote:
>>> Thanks, Josef. This doesn't really answer my question, but thanks for
>>> your response.
>>>
>>>
>>> Date: Wed, 4 Feb 2009 12:44:27 -0500
>>> From: josef.pktd@gmail.com
>>> Subject: Re: [SciPy-user] Mysterious kmeans() error
>>> To: SciPy Users List <scipy-user@scipy.org>
>>> Message-ID:
>>> <1cd32cbb0902040944m306bbf0bia357c01d0f97fe6d@mail.gmail.com>
>>> Content-Type: text/plain; charset=ISO-8859-1
>>>
>>> On Wed, Feb 4, 2009 at 12:28 PM, Roy H. Han
>>> <starsareblueandfaraway@gmail.com> wrote:
>>>> As a side comment, if I use Pycluster, then the clustering proceeds
>>>> without error.
>>>>
>>>> On Wed, Feb 4, 2009 at 11:31 AM, Roy H. Han
>>>> <starsareblueandfaraway@gmail.com> wrote:
>>>>> Has anyone seen this error before? I have no idea what it means. I'm
>>>>> using version 0.6.0 packaged for Fedora.
>>>>> I'm getting this error using the kmeans2() implementation in scipy.cluster.vq
>>>>>
>>>>>
>>>>> File "/mnt/windows/svn/networkPlanner/acquisition/libraries/probability_process.py",
>>>>> line 55, in grapeCluster
>>>>> assignments = scipy.cluster.vq.kmeans2(globalCluster, k=2,
>>>>> iter=iterationCountPerBurst)[1]
>>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>>> 563, in kmeans2
>>>>> clusters = init(data, k)
>>>>> File "/usr/lib64/python2.5/site-packages/scipy/cluster/vq.py", line
>>>>> 469, in _krandinit
>>>>> x = N.dot(x, N.linalg.cholesky(cov).T) + mu
>>>>> File "/usr/lib64/python2.5/site-packages/numpy/linalg/linalg.py",
>>>>> line 418, in cholesky
>>>>> Cholesky decomposition cannot be computed'
>>>>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite -
>>>>> Cholesky decomposition cannot be computed
>>>
>>> This is just a general answer, I never used scipy.cluster
>>>
>>> The error message means that the covariance matrix of your
>>> np.cov(data) is not positive definite. Check your data, whether there
>>> is any linear dependence, eg. look at eigenvalues of np.cov(data).
>>>
>>> If that's not the source of the error, then a cluster expert is needed.
>>>
>>> Josef
>>>
>>
>> I had looked a bit more, and I get the same error if the data has more
>> columns than rows.
>> The assumption in scipy.cluster is that columns represent random
>> variables and rows represent
>> observations. So, if the matrix is transposed then also the same
>> exception is raised as in your case
>>
>> Josef
>>
>> BTW: it's better to reply to individual threads than to the Digest,
>> since that preserves the subject line and threading.
>>
>
More information about the SciPy-user
mailing list