[Numpy-discussion] Faster

Robert Kern robert.kern@gmail....
Fri May 2 21:25:01 CDT 2008

```On Fri, May 2, 2008 at 9:02 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
>
> <charlesr.harris@gmail.com> wrote:
>
> > Isn't the lengthy part finding the distance between clusters?  I can think
>  > of several ways to do that, but I think you will get a real speedup by doing
>  > that in c or c++. I have a module made in boost python that holds clusters
>  > and returns a list of lists containing their elements. Clusters are joined
>  > by joining any two elements, one from each. It wouldn't take much to add a
>  > distance function, but you could use the list of indices in each cluster to
>  > pull a subset out of the distance matrix and then find the minimum function
>  > in that. This also reminds me of Huffman codes.
>
>  You're right. Finding the distance is slow. Is there any way to speed
>  up the function below? It returns the row and column indices of the
>  min value of the NxN array x.
>
>  def dist(x):
>     x = x + 1e10 * np.eye(x.shape[0])
>     i, j = np.where(x == x.min())
>
>     return i[0], j[0]

Assuming x is contiguous and you can modify x in-place:

In [1]: from numpy import *

In [2]: def dist(x):
...:    x = x + 1e10 * eye(x.shape[0])
...:    i, j = where(x == x.min())
...:    return i[0], j[0]
...:

In [3]: def symmdist(N):
...:     x = random.rand(N, N)
...:     x = x + x.T
...:     x.flat[::N+1] = 0
...:     return x
...:

In [4]: symmdist(5)
Out[4]:
array([[ 0.        ,  0.87508654,  1.11691704,  0.80366071,  0.57966808],
[ 0.87508654,  0.        ,  1.5521685 ,  1.74010886,  0.52156877],
[ 1.11691704,  1.5521685 ,  0.        ,  1.22725396,  1.04101992],
[ 0.80366071,  1.74010886,  1.22725396,  0.        ,  1.94577965],
[ 0.57966808,  0.52156877,  1.04101992,  1.94577965,  0.        ]])

In [5]: def kerndist(x):
...:     N = x.shape[0]
...:     x.flat[::N+1] = x.max()
...:     ij = argmin(x.flat)
...:     i, j = divmod(ij, N)
...:     return i, j
...:

In [10]: x = symmdist(500)

In [15]: %timeit dist(x)
10 loops, best of 3: 19.9 ms per loop

In [16]: %timeit kerndist(x)
100 loops, best of 3: 4.38 ms per loop

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
```