# [Numpy-discussion] more functions for MLab.py: find, unique, union, intersect, etc.

Mark Histed histed at MIT.EDU
Wed Aug 7 17:02:18 CDT 2002

```I've written some glue functions along the lines of what's in MLab.py:
find
unique
union
intersect
setdiff
setxor

Quick questions:
* any opinions on the algorithms used?
* are these worth adding to MLab.py?  If so, do I need to provide
documentation updates?  I can send this as a patch to MLab.py too.
* Does anyone have opinions on what to do about the optional outputs from
matlab functions?  Is it worth trying to provide index matrices as second
arguments, for example?  If so, any opinions on implementation?

a matlab user trying to get started with Numeric,
Mark

-------------- next part --------------
from Numeric import *

# Notes:
#   I've used the existing MLab.py documentation conventions, not the pydoc
#   (PEP 256) ones.
#
#   Optional arguments:  Most of matlab's functions return 1 or more optional
#   arguments.  I'm not sure what to do here, we could add a keyword argument
#   to each python function that is a flag specifying
#   whether or not to return the optional arguments.
#   Right now we return only the first result
#

def __tovector(M):
"""Convert an input array to a vector"""
return (resize(M, (multiply.reduce(shape(M)),)))

def find(M):
"""find(M) returns the indices where non-zero values occur in the input
matrix.  If M is a matrix, it is reshaped to a vector.  Note that the
indices returned are python-style (0...length-1) rather than
Matlab-style (1...length)."""
V = __tovector(M)
return compress(M, arange(0,len(V)))

## Set functions

# Notes:
#   unique is implemented by sorting the vectors first.  The algorithm
#   is N*log(N) in speed (due to the sorting) and linear in N in
#   memory usage.  Anyone have better ideas?  You can use
#   equal.outer(N,N) but that's N^2 in both memory usage and speed.

#   ismember uses equal.outer and is thus linear in the sizes of both
#   vectors in terms of both speed and memory.  If the vectors are of
#   equal size, that means the algorithm is N^2 in the size of the
#   vectors.  This may be acceptable because the second vector is
#   generally much smaller than the first.  There are probably better
#   ways to do it but my brain is tired.
#
#   Index matrices as second arguments: see notes about optional arguments
#   above.

def unique(N):
"""unique(N) returns a vector containing the same values as in N but with
all repetitions removed.
The result is sorted."""
# Matlab's unique accepts an N-d matrix but reshapes it to be a vector.
# Also, Matlab's function always sorts.
Nsort = sort(__tovector(N))
return concatenate(([Nsort[0]],
compress(not_equal(Nsort[:-1], Nsort[1:]), Nsort[1:])))

def ismember(V,S):
"""ismember(V,S) returns a vector the same size as V containing 1 where the
corresponding element of V is in the set S, and 0 otherwise."""
assert(len(shape(V)) == 1, 'First argument must be a vector')
return logical_or.reduce(equal.outer(V,S),1)

def intersect(M,N):
"""intersect(M,N) returns a vector whose elements are the elements that
occur in both M and N.
The result is sorted."""
uM = unique(M)
return compress(ismember(uM, unique(N)), uM)

def union(M,N):
"""union(M,N) returns a vector whose elements are the elements that occur in
either M or N.
The result is sorted."""
return unique(concatenate((M,N)))

def setdiff(M,N):
"""setdiff(M,N) returns a vector containing all the elements of M that
are not in N.
The result is sorted."""
uM = unique(M)
return compress(logical_not(ismember(uM, unique(N))), uM)

def setxor(M,N):
"""setxor(M,N) returns a vector containing all the elements in the union of
M and N that are not in the intersection of M and N.
The result is sorted."""
return setdiff(union(M,N), intersect(M,N))

```