[SciPy-dev] scipy.optimize.nonlin rewrite

Pauli Virtanen pav@iki...
Wed Dec 17 14:55:57 CST 2008


Sorry for the delay -- busy times...

First, a short summary of what I think it would be good to:

- Remove broyden1_modified

- Remove broyden_generalized; the code is identical to anderson
  with w0=0.

- Remove broyden_modified; if I understand Eyert [1] correctly, this method
  is expected to be always inferior to (extended) Anderson mixing. So I don't
  see a real need for retaining this in Scipy. Also, I don't know how to check
  that it works correctly.

- Remove anderson2, because it does not work for any of the test problems.
  Since `anderson` seems to work, I don't think it's wise to try to get
  this second implementation up to speed.

- Change broyden1 to update the inverse Jacobian using Sherman-Morrison.
  No need for inverting J on every iteration.

- Convert broyden3 to a simple limited-memory method.
  Rename it, too.

- Change the API so that there's a single nonlinear solver function,
  which takes an option that determines which Jacobian approximation to use.

- Maybe add a reverse-communication interface, see below.

.. [1] V. Eyert, J. Comp. Phys, 124, 271 (1996).

Here comes the rest:

Mon, 08 Dec 2008 18:43:47 +0100, Ondrej Certik wrote:
> On Sat, Nov 29, 2008 at 8:40 PM, Pauli Virtanen <pav@iki.fi> wrote:
> First let me apologize for taking me so long to reply. I wrote this code
> in the first place and I am happy that Pauli has rewritten it. I agree
> with the general direction, but I think this change should not go into
> 0.7.0, as it changes the interface and it is not well tested yet. Also,
> you renamed all the working broyden implementations that I use as
> BadBroyden, so I suggest to name them GoodBroyden, more below.

This is probably not going to be in 0.7.0, even though the release was
postponed. The API might still need a bit of thinking, and it's probably best
to test this thoroughly.

The difference in Broyden's good/first and bad/second method is to my
understanding the one mentioned in Wikipedia: both methods can be written as
updates to the inverse Jacobian because the update is rank-1 so that the
Sherman-Morrison formula applies. The difference can be seen if one looks at
the domains (right-hand vector) of the rank-1 inverse updates: Broyden's "good"
method uses J_{n-1}^{-T} dx, whereas the "bad" method uses df. These two
vectors are not equal, even though both updates satisfy the quasi-Newton

>> - Rewrote docstrings.
> in general +1, but I don't understand what all the implementations
> actually do. I suggest to also port my comments from the module
> docstring into the respective classes. I can help you with that, after
> we agree on other things below.

I'm not sure how much we can say about relative efficiency of the various
methods. We can cite some references on this, but I'd rather not say much
based on a couple of test problems.

>> * I'd like to remove `broyden2` as the actual Jacobian approximation in
>>  this appears to be the same as in `broyden3`, and there does not
>>  appear to be much difference in the work involved in the two.
>>  Ondrej, since you wrote the original code, do you think there is a
>>  reason to keep both?
> I think it is, at least for tutorial purposes and also as an easy way to
> check that all is fine. Because there may be some numericall
> differences, also for a lot of iterations, the memory consumption of
> broyden3 is not limited (is it?). Broyden2 just stores the NxN dense
> matrix (that can of course be big), but that's it.

Well, I think having two implementations of the same method does not really
make sense, and scipy.optimize should contain only production-quality code.

But yes, the memory consumption of broyden3 and broyden2 are not exactly the
same, although neither is a real limited-memory method. See below.

>> * Also, which articles were used as reference for the non-Quasi-Newton
>>  algorithms:
>>  - `broyden_modified`. This appears to be a bit exotic, and it uses
>>    several magic constants (`w0`, `wl`) whose meaning is not clear to
>>    me.
>>    A reference would be helpful here, also for the user who has to
>>    choose the parameters as appropriate for his/her specific problem.
> You can use my master thesis:
> http://ondrej.certik.cz/site_media/cookbook/master.pdf
> pages 27-31. Everything is explained in there, plus references given to
> the original literature.

Looking at Eyert's paper, it seems to me that there's no need to retain
this method.

>>  - `broyden_generalized`, `anderson`, `anderson2`. These appear to be
>>    variants of Anderson mixing, so probably we only want at most one of
>>    these. Also, broyden_generalized() == anderson(w0=0), am I correct?
> Yes and no, see my master thesis for details and references. There are
> some subtle differences. But those methods have never worked for me
> well.
>>    `anderson` and `anderson2` don't appear to function equivalently,
>>    and I suspect the update formula in the latter is wrong, since this
>>    algorithm can't solve any of the test problems. Do you have a
>>    reference for this?
> Yes, see my master thesis, pages above, it's the reference [4].

Ok, anderson2 and anderson should be exactly equivalent. This probably means
that there's a bug in anderson2, or maybe the default parameters are somehow
strange. Of the tree, I'd retain only `anderson`.

>>  - `excitingmixing`. A reference would be useful, to clarify the
>>    heuristics used.
>>  - `linearmixing`. I'm not sure that Scipy should advocate this
>>    method :)
>>  Linearmixing and excitingmixing also seem to require something from
>>  the objective function, possibly that the eigenvalues of its Jacobian
>>  have the opposite sign than `alpha`. For example, neither of them can
>>  find a solution for the equation ``I x = 0`` where ``I`` is the
>>  identity matrix (test function F2). So, I'm a bit tempted to remove
>>  also these routines, as it seems they probably are not too useful for
>>  general problems.
> Please don't removed those routines, they are essential for electronic
> structure calculations, as they are very robust, fast and doesn't
> require any memory. It just works for many problems.

These are pretty straightforward methods, so maybe they can stay.  But the
docstrings should clearly say that these are intended only for specialized

I wonder if electronic structure (DFT) calculations are a bit of a special
case. Ie. are the eigenvalues of the Jacobian of the Kohn-Sham functional F[n]
all negative (or less than 1)? 

>> * One could add more algorithms suitable for large-scale problems; for
>>  example some limited-memory Broyden methods (eg. [1])
> I think in the references in my thesis, the broyden3 is sometimes called
> the limited-memory Broyden method. However, I checked the equations in
> [1] in your email and they need to do a singular-value-decomposition, so
> it seems there is yet another approach to this. So yes.

It's a part of a limited-memory method, but not the whole story, because the
present implementation does not actually throw any data away.  So, it's
an "unlimited memory" Broyden method :)

But it would be very easy (= 3 lines of code) to make it a simple
limited-memory method, just by throwing away iterates older than some M.  The
singular value reduction can be implemented later. There are probably many
different schemes proposed in the literature how the reduction can be done...

> Let me know what you think about my comments. Basically, scipy should
> have as many (working and well tested) algorithms as possible, with the
> same interface, so that one can change the algorithms and see which
> works for the particular problem.

Yes, but this must be balanced against maintenance + testing costs.  I want to
be sure that the code I contribute works, and I wouldn't like to spend too much
time working on unproven methods. So I'd rather include only methods for which
there is reason to believe that they work well.

> I am glad that you are also using those methods, so that we can work
> on this together.

It'd be very useful if you could test this code, including the line search,
on a real large-scale problem. At present, I'm not sure if it makes too many
iterations compared to its payoff.

Also, we might need to implement a reverse communication interface,
something on the lines of

    iterator = Solver(x0)
    for x in iterator:
        if iterator.converged(): break
        raise RuntimeError("Didn't converge")

For expensive self-consistent iterations, this often feels the natural way
to use the nonlinear solver.

Pauli Virtanen

More information about the Scipy-dev mailing list