[Numpy-discussion] MemoryError : with scipy.spatial.distance
Wed Apr 4 18:41:51 CDT 2012
Thanks Chris. So I guess the question becomes how can I efficiently
cluster 1 million x,y coordinates.
On Wed, Apr 4, 2012 at 4:35 PM, Chris Barker <email@example.com> wrote:
> On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap
>> close to a 900K points using DBSCAN algo. My input is a list of ~900k
>> tuples each having two points (x,y) coordinates. I am converting them
>> to numpy array and passing them to pdist method of
>> scipy.spatial.distance for calculating distance between each point.
> I think pdist creates an array that is:
> sum(range(num+points)) in size.
> That's going to be pretty darn big:
> 404999550000 elements
> I think that's about 3 terabytes:
> In : sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8
> Out: 2.946759559563361
> (for 64 bit floats)
>> I think the error has something to do with the default double dtype
>> of numpy array of pdist function.
> you *may* be able to get it to use float32 -- but as you can see, that
> probably won't help enough!
> You'll need a different approach!
> Christopher Barker, Ph.D.
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
> NumPy-Discussion mailing list
More information about the NumPy-Discussion