[Numpy-discussion] Matrix rank default tolerance - is it too low?
Charles R Harris
charlesr.harris@gmail....
Tue Jun 26 19:04:49 CDT 2012
On Tue, Jun 26, 2012 at 5:46 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
> Hi,
>
> On Tue, Jun 26, 2012 at 4:39 PM, Benjamin Root <ben.root@ou.edu> wrote:
> >
> >
> > On Tuesday, June 26, 2012, Charles R Harris wrote:
> >>
> >>
> >>
> >> On Tue, Jun 26, 2012 at 3:42 PM, Matthew Brett <matthew.brett@gmail.com
> >
> >> wrote:
> >>
> >> Hi,
> >>
> >> On Mon, Jun 18, 2012 at 3:50 PM, Matthew Brett <matthew.brett@gmail.com
> >
> >> wrote:
> >> > Hi,
> >> >
> >> > On Sun, Jun 17, 2012 at 7:22 PM, Charles R Harris
> >> > <charlesr.harris@gmail.com> wrote:
> >> >>
> >> >>
> >> >> On Sat, Jun 16, 2012 at 2:33 PM, Matthew Brett
> >> >> <matthew.brett@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> Hi,
> >> >>>
> >> >>> On Sat, Jun 16, 2012 at 8:03 PM, Matthew Brett
> >> >>> <matthew.brett@gmail.com>
> >> >>> wrote:
> >> >>> > Hi,
> >> >>> >
> >> >>> > On Sat, Jun 16, 2012 at 10:40 AM, Nathaniel Smith <njs@pobox.com>
> >> >>> > wrote:
> >> >>> >> On Fri, Jun 15, 2012 at 4:10 AM, Charles R Harris
> >> >>> >> <charlesr.harris@gmail.com> wrote:
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> On Thu, Jun 14, 2012 at 8:06 PM, Matthew Brett
> One potential problem is that it implies that it will always be the
> same as any version of matlab's tolerance. What if they change it in
> a future release? How likely are we to even notice?
>
>
> >> >>> >>> <matthew.brett@gmail.com>
> >> >>> >>> wrote:
> >> >>> >>>>
> >> >>> >>>> Hi,
> >> >>> >>>>
> >> >>> >>>> I noticed that numpy.linalg.matrix_rank sometimes gives full
> rank
> >> >>> >>>> for
> >> >>> >>>> matrices that are numerically rank deficient:
> >> >>> >>>>
> >> >>> >>>> If I repeatedly make random matrices, then set the first column
> >> >>> >>>> to be
> >> >>> >>>> equal to the sum of the second and third columns:
> >> >>> >>>>
> >> >>> >>>> def make_deficient():
> >> >>> >>>> X = np.random.normal(size=(40, 10))
> >> >>> >>>> deficient_X = X.copy()
> >> >>> >>>> deficient_X[:, 0] = deficient_X[:, 1] + deficient_X[:, 2]
> >> >>> >>>> return deficient_X
> >> >>> >>>>
> >> >>> >>>> then the current numpy.linalg.matrix_rank algorithm returns
> full
> >> >>> >>>> rank
> >> >>> >>>> (10) in about 8 percent of cases (see appended script).
> >> >>> >>>>
> >> >>> >>>> I think this is a tolerance problem. The ``matrix_rank``
> >> >>> >>>> algorithm
> >> >>> >>>> does this by default:
> >> >>> >>>>
> >> >>> >>>> S = spl.svd(M, compute_uv=False)
> >> >>> >>>> tol = S.max() * np.finfo(S.dtype).eps
> >> >>> >>>> return np.sum(S > tol)
> >> >>> >>>>
> >> >>> >>>> I guess we'd we want the lowest tolerance that nearly always or
> >> >>> >>>> always
> >> >>> >>>> identifies numerically rank deficient matrices. I suppose one
> >> >>> >>>> way of
> >> >>> >>>> looking at whether the tolerance is in the right range is to
> >> >>> >>>> compare
> >> >>> >>>> the calculated tolerance (``tol``) to the minimum singular
> value
> >> >>> >>>> (``S.min()``) because S.min() in our case should be very small
> >> >>> >>>> and
> >> >>> >>>> indicate the rank deficiency. The mean value of tol / S.min()
> for
> >> >>> >>>> the
> >> >>> >>>> current algorithm, across many iterations, is about 2.8. We
> >> >>> >>>> might
> >> >>> >>>> hope this value would be higher than 1, but not much higher,
> >> >>> >>>> otherwise
> >> >>> >>>> we might be rejecting too many columns.
> >> >>> >>>>
> >> >>> >>>> Our current algorithm for tolerance is the same as the 2-norm
> of
> >> >>> >>>> M *
> >> >>> >>>> eps. We're citing Golub and Van Loan for this, but now I look
> at
> >> >>> >>>> our
> >> >>> >>>> copy (p 261, last para) - they seem to be suggesting using u *
> >> >>> >>>> |M|
> >> >>> >>>> where u = (p 61, section 2.4.2) eps / 2. (see [1]). I think
> the
> >> >>> >>>> Golub
> >>
> >>
> >> I'm fine with that, and agree that it is likely to lead to fewer folks
> >> wondering why Matlab and numpy are different. A good explanation in the
> >> function documentation would be useful.
> >>
> >> Chuck
> >>
> >
> > One potential problem is that it implies that it will always be the same
> as
> > any version of matlab's tolerance. What if they change it in a future
> > release? How likely are we to even notice?
>
> I guess that matlab is unlikely to change for the same reason that we
> would be reluctant to change, once we've found an acceptable value.
>
> I was thinking that we would say something like:
>
> """
> The default tolerance is :
>
> tol = S.max() * np.finfo(M.dtype).eps * max((m, n))
>
> This corresponds to the tolerance suggested in NR page X, and to the
> tolerance used by MATLAB at the time of writing (June 2012; see
> http://www.mathworks.com/help/techdoc/ref/rank.html).
> """
>
> I don't know whether we would want to track changes made by matlab -
> maybe we could have that discussion if they do change?
>
I wouldn't bother tracking Matlab, but I think the alternative threshold
could be mentioned in the notes. Something like
A less conservative threshold is ...
Maybe mention that because of numerical uncertainty there will always be a
chance that the computed rank could be wrong, but that with the
conservative threshold the rank is very unlikely to be less than the
computed rank.
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120626/c2490d8e/attachment.html
More information about the NumPy-Discussion
mailing list