[SciPy-dev] scipy.optimize.nonlin rewrite
Pauli Virtanen
pav@iki...
Sat Nov 29 13:40:45 CST 2008
Hi all,
I spent some time rewriting the scipy.optimize.nonlin module:
http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py
The following things changed:
- Support tolerance-based stopping conditions (cf. ticket #791)
- Improved handling of input and return values from the functions,
so that they are now easier to use.
- Don't use np.matrix at all.
- Jacobian approximations factored into classes; the iteration code is
now in only one place, so trust-region handling or other improvements
can be added to a single place later on.
- There's now a line search to the direction inverting the Jacobian
gave. (But there's no checking that it is an actual descent direction
for some merit function.)
- Rewrote docstrings.
The routines should produce the same output as previously. The tests are
still not very strong, however.
But I have now some questions:
* Getting this to 0.7.0; this is a complete rewrite, so is it too late,
and is it better to wait for 0.8?
* Some of the algorithms in there don't appear to work too well, and
some appear to be redundant. I'd like to clean up this a bit, leaving
only known-good stuff in.
* I'd like to remove `broyden2` as the actual Jacobian approximation in
this appears to be the same as in `broyden3`, and there does not
appear to be much difference in the work involved in the two.
Ondrej, since you wrote the original code, do you think there is
a reason to keep both?
* `broyden1_modified` appears to be, in the end if you work out the
matrix algebra, updating the inverse Jacobian in a way that
corresponds to
J := J + (y - J s / 2) s^T / ||s||^2
for the Jacobian (with s = dx, y = df). Apart from the factor
1/2, it's Broyden's good method. [1] One can also verify that the
updated inverse Jacobian does not satisfy the quasi-Newton condition
s = J^{-1} y, and that `broyden1_modified` doesn't generate the same
sequence as `broyden1`.
Hence, I'd like to remove this routine, unless there's some literature
that shows that the above works better than Broyden's method; Ondrej,
do you agree?
.. [1] http://en.wikipedia.org/wiki/Broyden%27s_method
http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula
* Also, which articles were used as reference for the non-Quasi-Newton
algorithms:
- `broyden_modified`. This appears to be a bit exotic, and it uses
several magic constants (`w0`, `wl`) whose meaning is not clear to
me.
A reference would be helpful here, also for the user who has to
choose the parameters as appropriate for his/her specific problem.
- `broyden_generalized`, `anderson`, `anderson2`. These appear to be
variants of Anderson mixing, so probably we only want at most
one of these. Also, broyden_generalized() == anderson(w0=0), am I
correct?
`anderson` and `anderson2` don't appear to function equivalently,
and I suspect the update formula in the latter is wrong, since this
algorithm can't solve any of the test problems. Do you have a
reference for this?
Is there a rule how `w0` should be chosen for some specific problem?
- `excitingmixing`. A reference would be useful, to clarify the
heuristics used.
- `linearmixing`. I'm not sure that Scipy should advocate this
method :)
Linearmixing and excitingmixing also seem to require something from
the objective function, possibly that the eigenvalues of its Jacobian
have the opposite sign than `alpha`. For example, neither of them can
find a solution for the equation ``I x = 0`` where ``I`` is the
identity matrix (test function F2). So, I'm a bit tempted to remove
also these routines, as it seems they probably are not too useful for
general problems.
* The code in there is still a bit naive implementation of the inexact
(Quasi-)Newton method, and one could add eg. add trust-region handling
or try to guarantee that the line search direction is always a
decrease direction for a merit function. (I don't know if it's
possible to do the latter unless one has a matrix representation of
the Jacobian approximation.) So, I suspect there are problems for
which eg. MINPACK code will find solutions, but for which the
nonlin.py code fails.
* One could add more algorithms suitable for large-scale problems; for
example some limited-memory Broyden methods (eg. [1]) or Secant-Krylov
methods [2].
.. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps
.. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods.
Journal of Computational Physics, 20(2):357–397, 2004.
I have implementations for both types of algorithms that could
possibly go in after some polishing.
--
Pauli Virtanen
More information about the Scipy-dev
mailing list