[Numpy-discussion] more functions for MLab.py: find, unique, union, intersect, etc.

Mark Histed histed at MIT.EDU
Wed Aug 7 17:02:18 CDT 2002

I've written some glue functions along the lines of what's in MLab.py:

Quick questions:
  * any opinions on the algorithms used?
  * are these worth adding to MLab.py?  If so, do I need to provide 
    documentation updates?  I can send this as a patch to MLab.py too.
  * Does anyone have opinions on what to do about the optional outputs from 
    matlab functions?  Is it worth trying to provide index matrices as second 
    arguments, for example?  If so, any opinions on implementation?

a matlab user trying to get started with Numeric,

-------------- next part --------------
from Numeric import *

# Notes:
#   I've used the existing MLab.py documentation conventions, not the pydoc
#   (PEP 256) ones.
#   Optional arguments:  Most of matlab's functions return 1 or more optional
#   arguments.  I'm not sure what to do here, we could add a keyword argument
#   to each python function that is a flag specifying
#   whether or not to return the optional arguments.
#   Right now we return only the first result

def __tovector(M):
    """Convert an input array to a vector"""
    return (resize(M, (multiply.reduce(shape(M)),)))

def find(M):
    """find(M) returns the indices where non-zero values occur in the input
    matrix.  If M is a matrix, it is reshaped to a vector.  Note that the
    indices returned are python-style (0...length-1) rather than
    Matlab-style (1...length)."""
    V = __tovector(M)
    return compress(M, arange(0,len(V)))

## Set functions

# Notes:
#   unique is implemented by sorting the vectors first.  The algorithm
#   is N*log(N) in speed (due to the sorting) and linear in N in
#   memory usage.  Anyone have better ideas?  You can use
#   equal.outer(N,N) but that's N^2 in both memory usage and speed.

#   ismember uses equal.outer and is thus linear in the sizes of both
#   vectors in terms of both speed and memory.  If the vectors are of
#   equal size, that means the algorithm is N^2 in the size of the
#   vectors.  This may be acceptable because the second vector is
#   generally much smaller than the first.  There are probably better
#   ways to do it but my brain is tired.
#   Index matrices as second arguments: see notes about optional arguments 
#   above.

def unique(N):
    """unique(N) returns a vector containing the same values as in N but with
    all repetitions removed.
    The result is sorted."""
    # Matlab's unique accepts an N-d matrix but reshapes it to be a vector.
    # Also, Matlab's function always sorts.   
    Nsort = sort(__tovector(N))
    return concatenate(([Nsort[0]],
                        compress(not_equal(Nsort[:-1], Nsort[1:]), Nsort[1:])))

def ismember(V,S):
    """ismember(V,S) returns a vector the same size as V containing 1 where the
    corresponding element of V is in the set S, and 0 otherwise."""
    assert(len(shape(V)) == 1, 'First argument must be a vector')
    return logical_or.reduce(equal.outer(V,S),1)
def intersect(M,N):
    """intersect(M,N) returns a vector whose elements are the elements that
    occur in both M and N.
    The result is sorted."""
    uM = unique(M)
    return compress(ismember(uM, unique(N)), uM)

def union(M,N):
    """union(M,N) returns a vector whose elements are the elements that occur in
    either M or N.
    The result is sorted."""
    return unique(concatenate((M,N)))

def setdiff(M,N):
    """setdiff(M,N) returns a vector containing all the elements of M that
    are not in N.
    The result is sorted."""
    uM = unique(M)
    return compress(logical_not(ismember(uM, unique(N))), uM)

def setxor(M,N):
    """setxor(M,N) returns a vector containing all the elements in the union of
    M and N that are not in the intersection of M and N.
    The result is sorted."""
    return setdiff(union(M,N), intersect(M,N))

More information about the Numpy-discussion mailing list