[Numpy-discussion] Faster

Hoyt Koepke hoytak@gmail....
Sat May 3 19:56:15 CDT 2008


You could also try complete linkage, where you merge two clusters
based on the farthest distance between points in two clusters instead
of the smallest.  This will tend to get clusters of equal size (which
isn't always ideal, either).  However, it also uses sufficient
statistics, so it will be trivial to change your code to use that
merge criteria if you want to try it.

--Hoyt





On Sat, May 3, 2008 at 5:31 PM, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Sat, May 3, 2008 at 5:05 PM, Christopher Barker
>  <Chris.Barker@noaa.gov> wrote:
>
> > Robert Kern wrote:
>  >  > I can get a ~20% improvement with the following:
>  >
>  >
>  > > In [9]: def mycut(x, i):
>  >  >    ...:     A = x[:i,:i]
>  >  >    ...:     B = x[:i,i+1:]
>  >  >    ...:     C = x[i+1:,:i]
>  >  >    ...:     D = x[i+1:,i+1:]
>  >  >    ...:     return hstack([vstack([A,C]),vstack([B,D])])
>  >
>  >  Might it be a touch faster to built the final array first, then fill it:
>  >
>  >  def mycut(x, i):
>  >      r,c = x.shape
>  >      out = np.empty((r-1, c-1), dtype=x.dtype)
>  >      out[:i,:i] = x[:i,:i]
>  >      out[:i,i:] = x[:i,i+1:]
>  >      out[i:,:i] = x[i+1:,:i]
>  >      out[i:,i+1:] = x[i+1:,i+1:]
>  >      return out
>  >
>  >  totally untested.
>  >
>  >  That should save the creation of two temporaries.
>
>  Initializing the array makes sense. And it is super fast:
>
>  >> timeit mycut(x, 6)
>  100 loops, best of 3: 7.48 ms per loop
>  >> timeit mycut2(x, 6)
>  1000 loops, best of 3: 1.5 ms per loop
>
>  The time it takes to cluster went from about 1.9 seconds to 0.7
>  seconds! Thank you.
>
>  When I run the single linkage clustering on my data I get one big
>  cluster and a bunch of tiny clusters. So I need to try a different
>  linkage method. Average linkage sounds good, but it sounds hard to
>  code.
>
>
> _______________________________________________
>  Numpy-discussion mailing list
>  Numpy-discussion@scipy.org
>  http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
hoytak@gmail.com
+++++++++++++++++++++++++++++++++++


More information about the Numpy-discussion mailing list