[Numpy-discussion] Fastest distance matrix calc

Keir Mierle mierle@gmail....
Mon Apr 16 18:31:55 CDT 2007

```On 4/13/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
> On 4/13/07, Bill Baxter <wbaxter@gmail.com> wrote:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246

[snip]
> I'm going to go out on a limb and contend, without running any timings, that
> for large M and N, a solution using a for loop will beat either of those.
> For example (untested):
>
>  results = empty([M, N], float)
> # You could be fancy and swap axes depending on which array is larger, but
> # I'll leave that for someone else
> for i, v in enumerate(x):
>     results[i] = sqrt(sum((v-y)**2, axis=-1))
>  Or something like that. The reason that I suspect this will be faster is
> that it has better locality, completely finishing a computation on a
> relatively small working set before moving onto the next one. The one liners
> have to pull the potentially large MxN array into the processor repeatedly.

In my experience, it is indeed the case that the for loop version is
faster. The fastest of the three versions offered in the above url is
the last:

from numpy import mat, zeros, newaxis
def calcDistanceMatrixFastEuclidean2(nDimPoints):
nDimPoints = array(nDimPoints)
n,m = nDimPoints.shape
delta = zeros((n,n),'d')
for d in xrange(m):
data = nDimPoints[:,d]
delta += (data - data[:,newaxis])**2
return sqrt(delta)

This is easily extended to two different nDimPoints matricies.

Cheers,
Keir
```