[Numpy-discussion] Numpify this?

Anne Archibald peridot.faceted@gmail....
Sun May 18 04:14:20 CDT 2008


2008/5/18 Matt Crane <matt@snapbug.geek.nz>:
> On Sun, May 18, 2008 at 8:52 PM, Robert Kern <robert.kern@gmail.com> wrote:
>> Are there repeats?
> No, no repeats in the first column.
>
> I'm going to go get a cup of coffee before I forget to leave out any
> potentially vital information again. It's going to be a long day.

It can be done, though I had to be kind of devious. My solution might
not even be O(n log n), depending on how mergesort is implemented:

def find_matches(A,B):
    """Find positions A_idx and B_idx so that A[A_idx]==B[B_idx]

    A and B should be sorted arrays with no repeats; A_idx and
    B_idx will be all the places they agree.

    >>> import numpy as np
    >>> A = np.array([1,2,4,5,7])
    >>> B = np.array([1,3,5,9])
    >>> find_matches(A,B)
    array([0,3]), array([0,2])
    """
    AB = np.concatenate((A,B))
    idx = np.argsort(np.concatenate((A,B)),kind="mergesort")
    sorted = AB[idx]
    pairs = np.where(np.diff(sorted)==0)[0]
    A_pairs = idx[pairs]
    B_pairs = idx[pairs+1]-len(A)

    if np.any(A_pairs>=len(A)):
        raise ValueError, "first array contains a repeated element"
    if np.any(B_pairs<0):
        raise ValueError, "second array contains a repeated element"

    return A_pairs, B_pairs

The idea is that diff is not a bad way to find repeats in a sequence;
so concatenate the two sequences and sort them (if you're lucky
mergesort will be fast because they're individually sorted, but in any
case we need stability). Of course, you have to take the pairs in the
sorted sequence and figure out where they came from, but fortunately
you don't even need to invert the permutation returned by argsort (do
we have a tool to invert permutations, or should one just use argsort
again?).

Anne


More information about the Numpy-discussion mailing list