[SciPy-dev] scipy.optimize.nonlin rewrite

Pauli Virtanen pav@iki...
Sat Nov 29 13:40:45 CST 2008


Hi all,

I spent some time rewriting the scipy.optimize.nonlin module:

    http://github.com/pv/scipy/tree/ticket-791/scipy/optimize/nonlin.py

The following things changed:

- Support tolerance-based stopping conditions (cf. ticket #791)

- Improved handling of input and return values from the functions,
  so that they are now easier to use.

- Don't use np.matrix at all.

- Jacobian approximations factored into classes; the iteration code is
  now in only one place, so trust-region handling or other improvements
  can be added to a single place later on.

- There's now a line search to the direction inverting the Jacobian
  gave. (But there's no checking that it is an actual descent direction
  for some merit function.)

- Rewrote docstrings.

The routines should produce the same output as previously. The tests are
still not very strong, however.


But I have now some questions:

* Getting this to 0.7.0; this is a complete rewrite, so is it too late,
  and is it better to wait for 0.8?

* Some of the algorithms in there don't appear to work too well, and
  some appear to be redundant. I'd like to clean up this a bit, leaving
  only known-good stuff in.

* I'd like to remove `broyden2` as the actual Jacobian approximation in
  this appears to be the same as in `broyden3`, and there does not
  appear to be much difference in the work involved in the two. 

  Ondrej, since you wrote the original code, do you think there is
  a reason to keep both?

* `broyden1_modified` appears to be, in the end if you work out the
  matrix algebra, updating the inverse Jacobian in a way that
  corresponds to

      J := J + (y - J s / 2) s^T / ||s||^2

  for the Jacobian (with s = dx, y = df). Apart from the factor
  1/2, it's Broyden's good method. [1] One can also verify that the
  updated inverse Jacobian does not satisfy the quasi-Newton condition 
  s = J^{-1} y, and that `broyden1_modified` doesn't generate the same
  sequence as `broyden1`.

  Hence, I'd like to remove this routine, unless there's some literature
  that shows that the above works better than Broyden's method; Ondrej, 
  do you agree?

  .. [1] http://en.wikipedia.org/wiki/Broyden%27s_method
         http://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula

* Also, which articles were used as reference for the non-Quasi-Newton
  algorithms:

  - `broyden_modified`. This appears to be a bit exotic, and it uses
    several magic constants (`w0`, `wl`) whose meaning is not clear to
    me.

    A reference would be helpful here, also for the user who has to
    choose the parameters as appropriate for his/her specific problem.

  - `broyden_generalized`, `anderson`, `anderson2`. These appear to be
    variants of Anderson mixing, so probably we only want at most
    one of these. Also, broyden_generalized() == anderson(w0=0), am I
    correct?

    `anderson` and `anderson2` don't appear to function equivalently,
    and I suspect the update formula in the latter is wrong, since this
    algorithm can't solve any of the test problems. Do you have a
    reference for this?

    Is there a rule how `w0` should be chosen for some specific problem?

  - `excitingmixing`. A reference would be useful, to clarify the
    heuristics used.

  - `linearmixing`. I'm not sure that Scipy should advocate this
    method :)

  Linearmixing and excitingmixing also seem to require something from
  the objective function, possibly that the eigenvalues of its Jacobian
  have the opposite sign than `alpha`. For example, neither of them can
  find a solution for the equation ``I x = 0`` where ``I`` is the
  identity matrix (test function F2). So, I'm a bit tempted to remove
  also these routines, as it seems they probably are not too useful for
  general problems.

* The code in there is still a bit naive implementation of the inexact
  (Quasi-)Newton method, and one could add eg. add trust-region handling
  or try to guarantee that the line search direction is always a
  decrease direction for a merit function. (I don't know if it's
  possible to do the latter unless one has a matrix representation of
  the Jacobian approximation.) So, I suspect there are problems for
  which eg. MINPACK code will find solutions, but for which the
  nonlin.py code fails.

* One could add more algorithms suitable for large-scale problems; for
  example some limited-memory Broyden methods (eg. [1]) or Secant-Krylov
  methods [2].

  .. [1] http://www.math.leidenuniv.nl/~verduyn/publications/reports/equadiff.ps
  .. [2] D. A. Knoll and D. E. Keyes. Jacobian free Newton-Krylov methods.
         Journal of Computational Physics, 20(2):357–397, 2004.

  I have implementations for both types of algorithms that could
  possibly go in after some polishing.

-- 
Pauli Virtanen




More information about the Scipy-dev mailing list