[Numpy-discussion] Distance Matrix speed
Tim Hochberg
tim.hochberg at cox.net
Mon Jun 19 15:28:53 CDT 2006
Sebastian Beca wrote:
>I just ran Alan's script and I don't get consistent results for 100
>repetitions. I boosted it to 1000, and ran it several times. The
>faster one varied alot, but both came into a ~ +-1.5% difference.
>
>When it comes to scaling, for my problem(fuzzy clustering), N is the
>size of the dataset, which should span from thousands to millions. C
>is the amount of clusters, usually less than 10, and K the amount of
>features (the dimension I want to sum over) is also usually less than
>100. So mainly I'm concerned with scaling across N. I tried C=3, K=4,
>N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results
>were:
>dist_beca: 1.1, 4.5, 16, 28, 37
>dist_loehner1: 1.7, 6.5, 22, 35, 47
>
>I also tried scaling across K, with C=3, N=2500, and K=5-50. I
>couldn't get any consistent results for small K, but both tend to
>perform as well (+-2%) for large K (K>15).
>
>I'm not sure how these work in the backend so I can't argument as to
>why one should scale better than the other.
>
>
The reason I suspect that dist_beca should scale better is that
dist_loehner1 generates an intermediate array of size NxCxK, while
dist_beca produces intermediate matrices that are only NxK or CxK. For
large problems, allocating that extra memory and fetching it into and
out of the cache can be a bottleneck.
Here's another version that allocates even less in the way of
temporaries at the expenses of being borderline incomprehensible. It
still allocates an NxK temporary array, but it allocates it once ahead
of time and then reuses it for all subsequent calculations. Your welcome
to use it, but I'm not sure I'd recomend it unless this function is
really a speed bottleneck as it could end up being hard to read later (I
left implementing the N<C case as an exercise to the reader....).
I have another idea that might reduce the memory overhead still further,
if I get a chance I'll try it out and let you know if it results in a
further speed up.
-tim
def dist2(A, B):
d = zeros([N, C], dtype=float)
if N < C:
raise NotImplemented
else:
tmp = empty([N, K], float)
tmp0 = tmp[:,0]
rangek = range(1,K)
for j in range(C):
subtract(A, B[j], tmp)
tmp *= tmp
for k in rangek:
tmp0 += tmp[:,k]
sqrt(tmp0, d[:,j])
return d
>Regards,
>
>Sebastian.
>
>On 6/19/06, Alan G Isaac <aisaac at american.edu> wrote:
>
>
>>On Sun, 18 Jun 2006, Tim Hochberg apparently wrote:
>>
>>
>>
>>>Alan G Isaac wrote:
>>>
>>>
>>>>On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:
>>>>
>>>>
>>>>>def dist():
>>>>>d = zeros([N, C], dtype=float)
>>>>>if N < C: for i in range(N):
>>>>>xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1))
>>>>>return d
>>>>>else:
>>>>>for j in range(C):
>>>>>xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1))
>>>>>return d
>>>>>
>>>>>
>>>>But that is 50% slower than Johannes's version:
>>>>
>>>>
>>>>def dist_loehner1():
>>>> d = A[:, newaxis, :] - B[newaxis, :, :]
>>>> d = sqrt((d**2).sum(axis=2))
>>>> return d
>>>>
>>>>
>>>Are you sure about that? I just ran it through timeit, using Sebastian's
>>>array sizes and I get Sebastian's version being 150% faster. This
>>>could well be cache size dependant, so may vary from box to box, but I'd
>>>expect Sebastian's current version to scale better in general.
>>>
>>>
>>No, I'm not sure.
>>Script attached bottom.
>>Most recent output follows:
>>for reasons I have not determined,
>>it doesn't match my previous runs ...
>>Alan
>>
>>
>>
>>>>>execfile(r'c:\temp\temp.py')
>>>>>
>>>>>
>>dist_beca : 3.042277
>>dist_loehner1: 3.170026
>>
>>
>>#################################
>>#THE SCRIPT
>>import sys
>>sys.path.append("c:\\temp")
>>import numpy
>>from numpy import *
>>import timeit
>>
>>
>>K = 10
>>C = 2500
>>N = 3 # One could switch around C and N now.
>>A = numpy.random.random( [N, K] )
>>B = numpy.random.random( [C, K] )
>>
>># beca
>>def dist_beca():
>> d = zeros([N, C], dtype=float)
>> if N < C:
>> for i in range(N):
>> xy = A[i] - B
>> d[i,:] = sqrt(sum(xy**2, axis=1))
>> return d
>> else:
>> for j in range(C):
>> xy = A - B[j]
>> d[:,j] = sqrt(sum(xy**2, axis=1))
>> return d
>>
>>#loehnert
>>def dist_loehner1():
>> # drawback: memory usage temporarily doubled
>> # solution see below
>> d = A[:, newaxis, :] - B[newaxis, :, :]
>> # written as 3 expressions for more clarity
>> d = sqrt((d**2).sum(axis=2))
>> return d
>>
>>
>>if __name__ == "__main__":
>> t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100)
>> t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100)
>> fmt="%-10s:\t"+"%10.6f"
>> print fmt%('dist_beca', t1)
>> print fmt%('dist_loehner1', t8)
>>
>>
>>
>>
>>_______________________________________________
>>Numpy-discussion mailing list
>>Numpy-discussion at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
>>
>
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>
>
More information about the Numpy-discussion
mailing list