[Numpy-discussion] Distance Matrix speed

Tim Hochberg tim.hochberg at cox.net
Mon Jun 19 16:39:14 CDT 2006


Tim Hochberg wrote:

>Sebastian Beca wrote:
>
>  
>
>>I just ran Alan's script and I don't get consistent results for 100
>>repetitions. I boosted it to 1000, and ran it several times. The
>>faster one varied alot, but both came into a  ~ +-1.5% difference.
>>
>>When it comes to scaling, for my problem(fuzzy clustering), N is the
>>size of the dataset, which should span from thousands to millions. C
>>is the amount of clusters, usually less than 10, and K the amount of
>>features (the dimension I want to sum over) is also usually less than
>>100. So mainly I'm concerned with scaling across N. I tried C=3, K=4,
>>N=1000, 2500, 5000, 7500, 10000. Also using 1000 runs, the results
>>were:
>>dist_beca: 1.1, 4.5, 16, 28, 37
>>dist_loehner1: 1.7, 6.5, 22, 35, 47
>>
>>I also tried scaling across K, with C=3, N=2500, and K=5-50. I
>>couldn't get any consistent results for small K, but both tend to
>>perform as well (+-2%) for large K (K>15).
>>
>>I'm not sure how these work in the backend so I can't argument as to
>>why one should scale better than the other.
>> 
>>
>>    
>>
>The reason I suspect that dist_beca should scale better is that 
>dist_loehner1 generates an intermediate array of size NxCxK, while 
>dist_beca produces intermediate matrices that are only NxK or CxK.  For 
>large problems, allocating that extra memory and fetching it into and 
>out of the cache can be a bottleneck.
>
>Here's another version that allocates even less in the way of 
>temporaries at the expenses of being borderline incomprehensible. It 
>still allocates an NxK temporary array, but it allocates it once ahead 
>of time and then reuses it for all subsequent calculations. Your welcome 
>to use it, but I'm not sure I'd recomend it unless this function is 
>really a speed bottleneck as it could end up being hard to read later (I 
>left implementing the N<C case as an exercise to the reader....).
>
>I have another idea that might reduce the memory overhead still further, 
>if I get a chance I'll try it out and let you know if it results in a 
>further speed up.
>
>-tim
>
>
>    def dist2(A, B):
>        d = zeros([N, C], dtype=float)
>        if N < C:
>            raise NotImplemented
>        else:
>            tmp = empty([N, K], float)
>            tmp0 = tmp[:,0]
>            rangek = range(1,K)
>            for j in range(C):
>                subtract(A, B[j], tmp)
>                tmp *= tmp
>                for k in rangek:
>                    tmp0 += tmp[:,k]
>                sqrt(tmp0, d[:,j])
>        return d
>  
>
Speaking of scaling: I tried this with K=25000 (10 x greater than 
Sebastian's original numbers). Much to my suprise it performed somewhat 
worse than the Sebastian's dist() with large K. Below is a modified 
dist2 that performs about the same (marginally better here) for large K 
as well as a dist3 that performs about 50% better at both K=2500 and 
K=25000.

-tim

    def dist2(A, B):
        d = empty([N, C], dtype=float)
        if N < C:
            raise NotImplemented
        else:
            tmp = empty([N, K], float)
            tmp0 = tmp[:,0]
            for j in range(C):
                subtract(A, B[j], tmp)
                tmp **= 2
                d[:,j] = sum(tmp, axis=1)
                sqrt(d[:,j], d[:,j])
        return d

    def dist3(A, B):
        d = zeros([N, C], dtype=float)
        rangek = range(K)
        if N < C:
            raise NotImplemented
        else:
            tmp = empty([N], float)
            for j in range(C):
                for k in rangek:
                    subtract(A[:,k], B[j,k], tmp)
                    tmp **= 2
                    d[:,j] += tmp
                sqrt(d[:,j], d[:,j])
        return d


>
>  
>
>>Regards,
>>
>>Sebastian.
>>
>>On 6/19/06, Alan G Isaac <aisaac at american.edu> wrote:
>> 
>>
>>    
>>
>>>On Sun, 18 Jun 2006, Tim Hochberg apparently wrote:
>>>
>>>   
>>>
>>>      
>>>
>>>>Alan G Isaac wrote:
>>>>     
>>>>
>>>>        
>>>>
>>>>>On Sun, 18 Jun 2006, Sebastian Beca apparently wrote:
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>def dist():
>>>>>>d = zeros([N, C], dtype=float)
>>>>>>if N < C: for i in range(N):
>>>>>>xy = A[i] - B d[i,:] = sqrt(sum(xy**2, axis=1))
>>>>>>return d
>>>>>>else:
>>>>>>for j in range(C):
>>>>>>xy = A - B[j] d[:,j] = sqrt(sum(xy**2, axis=1))
>>>>>>return d
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>But that is 50% slower than Johannes's version:
>>>>>       
>>>>>
>>>>>def dist_loehner1():
>>>>>      d = A[:, newaxis, :] - B[newaxis, :, :]
>>>>>      d = sqrt((d**2).sum(axis=2))
>>>>>    return d
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>Are you sure about that? I just ran it through timeit, using Sebastian's
>>>>array sizes and I get Sebastian's version being 150% faster. This
>>>>could well be cache size dependant, so may vary from box to box, but I'd
>>>>expect Sebastian's current version to scale better in general.
>>>>     
>>>>
>>>>        
>>>>
>>>No, I'm not sure.
>>>Script attached bottom.
>>>Most recent output follows:
>>>for reasons I have not determined,
>>>it doesn't match my previous runs ...
>>>Alan
>>>
>>>   
>>>
>>>      
>>>
>>>>>>execfile(r'c:\temp\temp.py')
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>dist_beca :       3.042277
>>>dist_loehner1:    3.170026
>>>
>>>
>>>#################################
>>>#THE SCRIPT
>>>import sys
>>>sys.path.append("c:\\temp")
>>>import numpy
>>>      
>>>
>>>from numpy import *
>>    
>>
>>>import timeit
>>>
>>>
>>>K = 10
>>>C = 2500
>>>N = 3 # One could switch around C and N now.
>>>A = numpy.random.random( [N, K] )
>>>B = numpy.random.random( [C, K] )
>>>
>>># beca
>>>def dist_beca():
>>>   d = zeros([N, C], dtype=float)
>>>   if N < C:
>>>       for i in range(N):
>>>           xy = A[i] - B
>>>           d[i,:] = sqrt(sum(xy**2, axis=1))
>>>       return d
>>>   else:
>>>       for j in range(C):
>>>           xy = A - B[j]
>>>           d[:,j] = sqrt(sum(xy**2, axis=1))
>>>   return d
>>>
>>>#loehnert
>>>def dist_loehner1():
>>>       # drawback: memory usage temporarily doubled
>>>       # solution see below
>>>       d = A[:, newaxis, :] - B[newaxis, :, :]
>>>       # written as 3 expressions for more clarity
>>>       d = sqrt((d**2).sum(axis=2))
>>>       return d
>>>
>>>
>>>if __name__ == "__main__":
>>>       t1 = timeit.Timer('dist_beca()', 'from temp import dist_beca').timeit(100)
>>>       t8 = timeit.Timer('dist_loehner1()', 'from temp import dist_loehner1').timeit(100)
>>>       fmt="%-10s:\t"+"%10.6f"
>>>       print fmt%('dist_beca', t1)
>>>       print fmt%('dist_loehner1', t8)
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Numpy-discussion mailing list
>>>Numpy-discussion at lists.sourceforge.net
>>>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>>
>>>   
>>>
>>>      
>>>
>>_______________________________________________
>>Numpy-discussion mailing list
>>Numpy-discussion at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
>> 
>>
>>    
>>
>
>
>
>
>_______________________________________________
>Numpy-discussion mailing list
>Numpy-discussion at lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>  
>






More information about the Numpy-discussion mailing list