[Numpy-discussion] MemoryError : with scipy.spatial.distance

Chris Barker chris.barker@noaa....
Wed Apr 4 18:35:55 CDT 2012


On Wed, Apr 4, 2012 at 4:17 PM, Abhishek Pratap
> close to a 900K points using DBSCAN algo. My input is a list of ~900k
> tuples each having two points (x,y) coordinates. I am converting them
> to numpy array and passing them to pdist method of
> scipy.spatial.distance for calculating distance between each point.

I think pdist creates an array that is:

sum(range(num+points)) in size.

That's going to be pretty darn big:

404999550000 elements

I think that's about 3 terabytes:

In [41]: sum(range(900000)) / 1024. / 1024 / 1024 / 1024 * 8
Out[41]: 2.946759559563361

(for 64 bit floats)


> I think the error has something to do with the default double dtype
> of numpy array of pdist function.

you *may* be able to get it to use float32 -- but as you can see, that
probably won't help enough!

You'll need a different approach!

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list