[Numpy-discussion] Proposal for matrix_rank function in numpy

josef.pktd@gmai... josef.pktd@gmai...
Tue Dec 15 11:12:37 CST 2009


On Tue, Dec 15, 2009 at 12:01 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
> Hi,
>
> Following on from the occasional discussion on the list, can I propose
> a small matrix_rank function for inclusion in numpy/linalg?
>
> I suggest it because it seems rather a basic need for linear algebra,
> and it's very small and simple...
>
> I've appended an implementation with some doctests in the hope that it
> will be acceptable,
>
> Robert - I hope you don't mind me quoting you in the notes.
>
> Thanks a lot,
>
> Matthew
>
>
> def matrix_rank(M, tol=None):
>    ''' Return rank of matrix using SVD method
>
>    Rank of the array is the number of SVD singular values of the
>    array that are greater than `tol`.
>
>    Parameters
>    ----------
>    M : array-like
>        array of <=2 dimensions
>    tol : {None, float}
>         threshold below which SVD values are considered zero. If `tol`
>         is None, and `S` is an array with singular values for `M`, and
>         `eps` is the epsilon value for datatype of `S`, then `tol` set
>         to ``S.max() * eps``.
>
>    Examples
>    --------
>    >>> matrix_rank(np.eye(4)) # Full rank matrix
>    4
>    >>> matrix_rank(np.c_[np.eye(4),np.eye(4)]) # Rank deficient matrix
>    4
>    >>> matrix_rank(np.zeros((4,4))) # All zeros - zero rank
>    0
>    >>> matrix_rank(np.ones((4,))) # 1 dimension - rank 1 unless all 0
>    1
>    >>> matrix_rank(np.zeros((4,)))
>    0
>    >>> matrix_rank([1]) # accepts array-like
>    1
>
>    Notes
>    -----
>    Golub and van Loan define "numerical rank deficiency" as using
>    tol=eps*S[0] (note that S[0] is the maximum singular value and thus
>    the 2-norm of the matrix). There really is not one definition, much
>    like there isn't a single definition of the norm of a matrix. For
>    example, if your data come from uncertain measurements with
>    uncertainties greater than floating point epsilon, choosing a
>    tolerance of about the uncertainty is probably a better idea (the
>    tolerance may be absolute if the uncertainties are absolute rather
>    than relative, even). When floating point roundoff is your concern,
>    then "numerical rank deficiency" is a better concept, but exactly
>    what the relevant measure of the tolerance is depends on the
>    operations you intend to do with your matrix. [RK, numpy mailing
>    list]
>
>    References
>    ----------
>    Matrix Computations by Golub and van Loan
>    '''
>    M = np.asarray(M)
>    if M.ndim > 2:
>        raise TypeError('array should have 2 or fewer dimensions')
>    if M.ndim < 2:
>        return int(not np.all(M==0))
>    S = npl.svd(M, compute_uv=False)
>    if tol is None:
>        tol = S.max() * np.finfo(S.dtype).eps
>    return np.sum(S > tol)
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

This was missing from numpy compared to matlab and gauss.

If we put it in linalg next to np.linalg.cond,  then we could shorten
the name to `rank`, since the meaning of np.linalg.rank should be
pretty obvious.

Josef


More information about the NumPy-Discussion mailing list