[Numpy-discussion] Proposal for matrix_rank function in numpy

Matthew Brett matthew.brett@gmail....
Tue Dec 15 14:16:25 CST 2009


Is it reasonable to summarize that, to avoid confusion, we keep
'matrix_rank' as the name?

I've edited as Robert suggested, attempting to adopt a more suitable
tone in the docstring...

Thanks a lot,


def matrix_rank(M, tol=None):
    ''' Return rank of matrix using SVD method

    Rank of the array is the number of SVD singular values of the
    array that are greater than `tol`.

    M : array-like
        array of <=2 dimensions
    tol : {None, float}
         threshold below which SVD values are considered zero. If `tol`
         is None, and `S` is an array with singular values for `M`, and
         `eps` is the epsilon value for datatype of `S`, then `tol` set
         to ``S.max() * eps``.

    >>> matrix_rank(np.eye(4)) # Full rank matrix
    >>> I=np.eye(4); I[-1,-1] = 0. # rank deficient matrix
    >>> matrix_rank(I)
    >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank
    >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0
    >>> matrix_rank(np.zeros((4,)))
    >>> matrix_rank([1]) # accepts array-like

    Golub and van Loan [1]_ define "numerical rank deficiency" as using
    tol=eps*S[0] (where S[0] is the maximum singular value and thus the
    2-norm of the matrix). This is one definition of rank deficiency,
    and the one we use here.  When floating point roundoff is the main
    concern, then "numerical rank deficiency" is a reasonable choice. In
    some cases you may prefer other definitions. The most useful measure
    of the tolerance depends on the operations you intend to use on your
    matrix. For example, if your data come from uncertain measurements
    with uncertainties greater than floating point epsilon, choosing a
    tolerance near that uncertainty may be preferable.  The tolerance
    may be absolute if the uncertainties are absolute rather than

    .. [1] G. H. Golub and C. F. Van Loan, _Matrix Computations_.
    Baltimore: Johns Hopkins University Press, 1996.
    M = np.asarray(M)
    if M.ndim > 2:
        raise TypeError('array should have 2 or fewer dimensions')
    if M.ndim < 2:
        return int(not np.all(M==0))
    S = npl.svd(M, compute_uv=False)
    if tol is None:
        tol = S.max() * np.finfo(S.dtype).eps
    return np.sum(S > tol)

More information about the NumPy-Discussion mailing list