[Numpy-discussion] distance_matrix: how to speed up?

Rob Hetland hetland@tamu....
Thu May 22 04:56:21 CDT 2008


On May 22, 2008, at 9:45 AM, Vincent Schut wrote:

> Reading this thread, I remembered having tried scipy's sandbox.rbf
> (radial basis function) to interpolate a pretty large,  
> multidimensional
> dataset, to fill in the missing data points. This however failed soon
> with out-of-memory errors, which, if I remember correctly, came from  
> the
> pretty straightforward distance calculation between the different data
> points that is used in this package. Being no math wonder, I assumed
> that there simply was no simple way to calculate distances without  
> using
> much memory, and ended my rbf experiments.
>
> To make a story short: correct me if I am wrong, but might it be an  
> idea
> to use the above solution in scipy.sandbox.rbf?


Yes, this would be a very good substitution.  Not only does it use  
less memory, but in my quick tests it is about as fast or faster.   
Really, though, both are pretty quick.  There will still be memory  
limitations, but you only need to store a matrix of (N, M), instead of  
(NDIM, N, M), so for many dimensions there will be big memory  
improvements.  Probably only small improvements for 3 dimensions or  
less.

I'm not sure where rbf lives anymore -- it's not in scikits.  I have  
my own version (parts of which got folded into the old scipy.sandbox  
version), that I would be willing to share if there is interest.

Really, though, the rbf toolbox will not be limited by the memory of  
the distance matrix.  Later on, you need to do a large linear algebra  
'solve', like this:


r = norm(x, x) # The distances between all of the ND points to each  
other.
A = psi(r)  # where psi is some divergent function, often the  
multiquadratic function : sqrt((self.epsilon*r)**2 + 1)
coefs = linalg.solve(A, data)  # where data is the length of x, one  
data point for each spatial point.

# to find the interpolated data points at xi
ri = norm(xi, x)
Ai = psi(ri)
di = dot(Ai, coefs)


All in all, it is the 'linalg.solve' that kills you.

-Rob

----
Rob Hetland, Associate Professor
Dept. of Oceanography, Texas A&M University
http://pong.tamu.edu/~rob
phone: 979-458-0096, fax: 979-845-6331



More information about the Numpy-discussion mailing list