[Numpy-discussion] Faster
Charles R Harris
charlesr.harris@gmail....
Fri May 2 20:29:51 CDT 2008
On Fri, May 2, 2008 at 7:16 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Fri, May 2, 2008 at 6:05 PM, Robert Kern <robert.kern@gmail.com> wrote:
> > On Fri, May 2, 2008 at 7:24 PM, Keith Goodman <kwgoodman@gmail.com>
> wrote:
> >
> >
> > > How can I make this function faster? It removes the i-th row and
> > > column from an array.
> > >
> > > def cut(x, i):
> > > idx = range(x.shape[0])
> > > idx.remove(i)
> > > y = x[idx,:]
> > > y = y[:,idx]
> > > return y
> > >
> > > >> import numpy as np
> > > >> x = np.random.rand(500,500)
> > > >> timeit cut(x, 100)
> > > 100 loops, best of 3: 16.8 ms per loop
> >
> > I can get a ~20% improvement with the following:
> >
> > In [8]: %timeit cut(x, 100)
> > 10 loops, best of 3: 21.6 ms per loop
> >
> > In [9]: def mycut(x, i):
> > ...: A = x[:i,:i]
> > ...: B = x[:i,i+1:]
> > ...: C = x[i+1:,:i]
> > ...: D = x[i+1:,i+1:]
> > ...: return hstack([vstack([A,C]),vstack([B,D])])
> > ...:
> >
> > In [10]: %timeit mycut(x, 100)
> > 10 loops, best of 3: 17.3 ms per loop
> >
> > The hstack(vstack, vstack) seems to be somewhat better than
> > vstack(hstack, hstack), at least for these sizes.
>
> Wow. I never would have come up with that. And I probably never will.
>
> Original code:
> >> cluster.test2(500)
> n = 500 took 5.28 seconds
>
> Your improvement:
> >> cluster.test2(500)
> n = 500 took 3.52 seconds
>
> Much more than a 20% improvement when used in the larger program. Thank
> you.
Isn't the lengthy part finding the distance between clusters? I can think
of several ways to do that, but I think you will get a real speedup by doing
that in c or c++. I have a module made in boost python that holds clusters
and returns a list of lists containing their elements. Clusters are joined
by joining any two elements, one from each. It wouldn't take much to add a
distance function, but you could use the list of indices in each cluster to
pull a subset out of the distance matrix and then find the minimum function
in that. This also reminds me of Huffman codes.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20080502/fff8937c/attachment.html
More information about the Numpy-discussion
mailing list