[Numpy-discussion] MemoryError : with scipy.spatial.distance
Thu Apr 5 00:33:51 CDT 2012
On Wed, Apr 04, 2012 at 04:41:51PM -0700, Abhishek Pratap wrote:
> Thanks Chris. So I guess the question becomes how can I efficiently
> cluster 1 million x,y coordinates.
Did you try the scikit-learn's implementation of DBSCAN:
? I am not sure that it scales, but it's worth trying.
Alternatively, the best way to cluster massive datasets is to use the
mini-batch implementation of KMeans:
Hope this helps,
More information about the NumPy-Discussion