[SciPy-dev] optimizers module

Matthieu Brucher matthieu.brucher@gmail....
Mon Aug 20 13:27:47 CDT 2007

> > The basic cubic interpolation works on alpha. If you want to implement
> > another based on x, not problem. I think that as a first step, we
> > should add standard algorithms that are documented and described.
> > After this step is done, we can explore.
> yes, but all your solvers are already written in terms of n-dimensional
> problem (x0 and direction, both are of nVars size), so it would be more
> natural to use xtol (from general problem), not alpha_tol (from
> line-search subproblem)

xtol is not available everywhere, it's something that is
criterion-dependent. It might be given or not. The documented algorithm is
based on alpha, and our main implementation should follow the documented
algorithm as well.

> > That is a good question that raised also in our discussion for the
> > Strong Wolfe Powell rules, at least for the maxIter.
> As for SWP, I think a check should be made if solution with required c1
> and c2 can't be obtained and/or don't exist at all.
> For example objFun(x) = 1e-5*x while c1 = 1e-4 (IIRC this is the example
> where I encountered alpha = 1e-28 -> f0 = f(x0+alpha*direction)). The
> SWP-based solver should produce something different than CPU hangup. OK,
> it turned to be impossible to obtain new_X that satisfies c1 and c2, but
> an approximation very often will be enough good to continue solving the
> NL problem involved.

According to the theory, a solution is found is the direction is correct and
if the function is continuously differentiable. But I agree that some
additional tests must be added.

So I think check for |x_prev - x_new | < xtol
> should be added and will be very helpful here. You have something like
> that with alphap in line s68-70 (swp.py) but this is very unclear and I
> suspect for some problems may be endless (as well as other stop creteria
> implemented for now in the SWP).

This check is done because of the interpolation nature of the WP algorithm.
You can't put |x_prev_x_new| < xtol as a stopping criteria. This is why your
idea of limiting the number of iterations is relevant.

> So the state dictionary is only responsible for what is specifically
> > connected to the function. Either the parameters, or different
> > evaluations (hessian, gradient, direction and so on). That's why you
> > "can't" put gradtol in it (for instance).
> I'm not know your code very good yet, but why can't you just set default
> params as I do in /Kernel/BaseProblem.py?

Because there are a lot of default parameters that could be set depending on
the algorithm. From an object-oriented point of view, this way of doing
things is correct: the different modules posess the arguments because they
are responsible for using them. Besides, you may want a different gradient
tolerance for different sub-modules.

And then if user wants to change any specific parameter - he can do it
> very easy. And no "very long and not readable line" are present in my
> code.

It won't be in your code, it would be in the user code. The goal of the
framework is to make things readable and usable. If every parameter is only
given to the optimizer, the other submodules are no longer of interest.
With the parameters given to each specific module with respect of their
belonging, one can create several instances of a specific module to use for
different optimizers. This is what object-oriented code is intended for.
I've done this in my PhD code, and it works very well.

I still think the approach is incorrect, user didn't ought to supply
> gradient, we should calculate it by ourselves if it's absent. At least
> any known to me optimization software do the trick.

Perhaps, but I have several reasons :
- when it's hidden, it's a magic trick. My view of the framework is that it
must not do anything like that. It's designed for advanced users that do not
want those tricks
- from an architectural point of view, it's wrong, plainly wrong. I'm a
scientist specialized in electronics and signal processing, but I have high
requirements for everything that is IT-oriented. Encapsulation is one part
of the object principle, and implementing finite-difference outside breaks
it (and I'm not talking about code duplication).

As for
> helpers.ForwardFiniteDifferenceDerivatives, it will take too much time
> for user to dig into documentation to find the one.

Not if it's clearly stated in the documentation, perhaps it can be enhanced.
It's on the first optimization page in the scikit wiki, with examples.

Also, as you see my f_and_df is optimized to not recalculate f(x0) while
> gradient obtaining numerically, like some do, for example approx_fprime
> in scipy.optimize. For problems with costly funcs and small nVars (1..5)
> speedup can be significant.

Yes, I agree. For optimization, an additional argument could be given to the
gradient that will be used if needed (remember that the other way of
implementing finite difference do not use f(x0)), but it will bring some
trouble to the user (every gradient function must have this additional
This should be discussed in a specific topic, because I don't think there is
a simple answer to this specific point (and as I stated before I prefer
something complete but not as fast than something incomplete but very fast,
and there is nothing like free lunch in IT, we have to make compromises).

Of course, it should be placed in single file for whole "optimizers"
> package, like I do in my ObjFunRelated.py, not in qubic_interpolation.py.
> But it would be better would you chose the most appropriate place (and /
> or informed me where is it).

It's already implemented (both versions) and available.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/scipy-dev/attachments/20070820/689b80d1/attachment.html 

More information about the Scipy-dev mailing list