[Numpy-discussion] "Match" two arrays

eat e.antero.tammi@gmail....
Thu Apr 1 05:54:47 CDT 2010

Shailendra <shailendra.vikas <at> gmail.com> writes:

> Hi All,
> I want to make a function which should be like this
> <code>
> cordinates1=(x1,y1) # x1 and y1 are x-cord and y-cord of a large
> number of points
> cordinates2=(x2,y2) # similar to condinates1
> indices1,indices2= match_cordinates(cordinates1,cordinates2)
> <code>
> (x1[indices1],y1[indices1]) "matches" (x2[indices2],y2[indices2])
> where definition of "match" is such that :
> If A is closest point to B and distance between A and B is less that
> delta than it is a "match".
> If A is closest point to B and distance between A and B is more that
> delta than there is no match.
> Every point has either 1 "match"(closest point) or none

This logic is problematic in general case. See below. You may need to be able 
to handle several pairs of 'closest points'!
> Also, the size of the cordinates1 and cordinates2 are quite large and
> "outer" should not be used. I can think of only C style code to
> achieve this. Can any one suggest pythonic way of doing this?
> Thanks,
> Shailendra
This is straightforward implementation as a starting point.


import numpy as np

def dist(p1, p2):
    return np.sqrt(np.sum((p1- p2)** 2, 0))

def cdist(p1, p2, trh):
    """Expects 2d arrays p1 and p2, with combatible first dimesions
    and a threshold.

    Returns indicies of points close to each other
    -ind[:, 0], array of p1 indicies
    -ind[:, 1], array of p2 indicies
    -ambi, list of list of ambiquous situations (where more
    than 1 pair of points are 'equally close')

    The indicies are aranged such that
    dist(p1[:, ind[k, 0]], p2[:, ind[k, 1]])< trh
    is true for all k.
    ind= []
    ambi= []
    for k in range(p2.shape[1]):
        d= dist(p1, p2[:, None, k])
        i= np.where(d< trh)[0]
        if 0< len(i):
            m= np.where(d[i]== d[i].min())[0] # problematic
            i= i[m].tolist()
            ind.append([i[0], k])
            if 1< len(m):
                ambi.append([ind[-1], i])
    return np.array(ind), ambi

if __name__ == '__main__':
    n= 10
    trh= 2e-1
    p1= np.round(np.random.rand(2, n), 1)
    p2= np.round(p1+ 1e-1* np.random.randn(2, n), 1)
    ind, ambi= cdist(p1, p2, trh)
    print 'points close to each other:'
    if 0< len(ind):
        print 'p1:'
        print p1[:, ind[:, 0]], ind[:, 0]
        print 'p2:'
        print p2[:, ind[:, 1]], ind[:, 1]
        print 'valid:'
        print dist(p1[:, ind[:, 0]], p2[:, ind[:, 1]])< trh
        print 'with ambiguous situation(s):'
        if ambi:
            print ambi
            print 'None'
        print 'None'

    import timeit
    n= 1e2
    trh= 2e-1
    rep= 5
    p1= np.random.rand(2, 1e3* n)
    p2= np.random.randn(2, n)
    def perf():
        ind, ambi= cdist(p1, p2, trh)

    print 'performance:'
    t= np.array(timeit.repeat(perf, repeat= rep, number= 1))/ rep
    print 'min: ', t.min(), 'mean: ', t.mean(), 'max: ', t.max()

More information about the NumPy-Discussion mailing list