[Numpy-discussion] MemoryError : with scipy.spatial.distance
Chris Barker
chris.barker@noaa....
Wed Apr 4 18:35:55 CDT 2012
On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap
> close to a 900K points using DBSCAN algo. My input is a list of ~900k
> tuples each having two points (x,y) coordinates. I am converting them
> to numpy array and passing them to pdist method of
> scipy.spatial.distance for calculating distance between each point.
I think pdist creates an array that is:
sum(range(num+points)) in size.
That's going to be pretty darn big:
404999550000 elements
I think that's about 3 terabytes:
In [41]: sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8
Out[41]: 2.946759559563361
(for 64 bit floats)
> I think the error has something to do with the default double dtype
> of numpy array of pdist function.
you *may* be able to get it to use float32 -- but as you can see, that
probably won't help enough!
You'll need a different approach!
-Chris
