[Numpy-discussion] Faster
Keith Goodman
kwgoodman@gmail....
Sat May 3 19:31:05 CDT 2008
On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
<Chris.Barker@noaa.gov> wrote:
> Robert Kern wrote:
> > I can get a ~20% improvement with the following:
>
>
> > In [9]: def mycut(x, i):
> > ...: A = x[:i,:i]
> > ...: B = x[:i,i+1:]
> > ...: C = x[i+1:,:i]
> > ...: D = x[i+1:,i+1:]
> > ...: return hstack([vstack([A,C]),vstack([B,D])])
>
> Might it be a touch faster to built the final array first, then fill it:
>
> def mycut(x, i):
> r,c = x.shape
> out = np.empty((r-1, c-1), dtype=x.dtype)
> out[:i,:i] = x[:i,:i]
> out[:i,i:] = x[:i,i+1:]
> out[i:,:i] = x[i+1:,:i]
> out[i:,i+1:] = x[i+1:,i+1:]
> return out
>
> totally untested.
>
> That should save the creation of two temporaries.
Initializing the array makes sense. And it is super fast:
>> timeit mycut(x, 6)
100 loops, best of 3: 7.48 ms per loop
>> timeit mycut2(x, 6)
1000 loops, best of 3: 1.5 ms per loop
The time it takes to cluster went from about 1.9 seconds to 0.7
seconds! Thank you.
When I run the single linkage clustering on my data I get one big
cluster and a bunch of tiny clusters. So I need to try a different
linkage method. Average linkage sounds good, but it sounds hard to
code.
