[SciPy-dev] optimizers module
Mon Aug 20 12:40:18 CDT 2007
Matthieu Brucher wrote:
> Hi again ;)
> I have already committed
> qubic should be cubic, no ?
yes, I will rename it now
> the problems:
> 1. I have implemented stop tolerance x as self.minStepSize.
> However, isn't it more correct to observe |x_prev - x_new|
> according to
> given by user xtol, than to observe |alpha_prev - alpha_new|
> to xtol? If the routine is called from a multi-dimensional NL problem
> with known xtol, provided by user, I think it's more convenient
> and more
> correct to observe |x_prev - x_new| instead of |alpha_prev -
> as stop criterion.
> The basic cubic interpolation works on alpha. If you want to implement
> another based on x, not problem. I think that as a first step, we
> should add standard algorithms that are documented and described.
> After this step is done, we can explore.
yes, but all your solvers are already written in terms of n-dimensional
problem (x0 and direction, both are of nVars size), so it would be more
natural to use xtol (from general problem), not alpha_tol (from
> 2. (this is primarily for Matthieu): where should be gradtol taken
> It's main stop criterion, according to alg.
> Currently I just set it to 1e-6.
> It should be taken in the constructor (see the damped_line_search.py
> for instance)
> 3. Don't you think that maxIter and/or maxFunEvals rule should be
> (I ask because I didn't see that ones in Matthieu's
> quadratic_interpolation solver).
> That is a good question that raised also in our discussion for the
> Strong Wolfe Powell rules, at least for the maxIter.
As for SWP, I think a check should be made if solution with required c1
and c2 can't be obtained and/or don't exist at all.
For example objFun(x) = 1e-5*x while c1 = 1e-4 (IIRC this is the example
where I encountered alpha = 1e-28 -> f0 = f(x0+alpha*direction)). The
SWP-based solver should produce something different than CPU hangup. OK,
it turned to be impossible to obtain new_X that satisfies c1 and c2, but
an approximation very often will be enough good to continue solving the
NL problem involved. So I think check for |x_prev - x_new | < xtol
should be added and will be very helpful here. You have something like
that with alphap in line s68-70 (swp.py) but this is very unclear and I
suspect for some problems may be endless (as well as other stop creteria
implemented for now in the SWP).
> It will make algorithms more stable to
> CPU-hanging errors because of our errors and/or special funcs
> I had implemented that ones but since Matthieu didn't have them in
> quadratic_interpolation, I just comment out the stop criteria (all
> I can
> do is to set my defaults like 400 or 1000 (as well as gradtol
> 1e-6), but
> since Matthieu "state" variable (afaik) not have those ones - I can't
> take them as parameters).
> So should they exist or not?
> If you want to use them, you should put them in the __init__ method as
> The state could be populated with everything, but that would mean very
> cumbersome initializations. On one hand, you should create each module
> with no parameter and pass all of them to the optimizer. That could
> mean a very long and not readable line. On the other hand, you should
> create the optimizer, create then every module with the optimizer as a
> parameter. Not intuitive enough.
> This is were the limit between the separation principle and the object
> orientation is fuzzy.
> So the state dictionary is only responsible for what is specifically
> connected to the function. Either the parameters, or different
> evaluations (hessian, gradient, direction and so on). That's why you
> "can't" put gradtol in it (for instance).
I'm not know your code very good yet, but why can't you just set default
params as I do in /Kernel/BaseProblem.py?
And then if user wants to change any specific parameter - he can do it
very easy. And no "very long and not readable line" are present in my code.
> I saw that you test for the presence of the gradient method, you
> should not. If people want to use this line search, they _must_
> provide a gradient. If they can't provide an analytical gradient, they
> can provide a numerical one by using
> helpers.ForwardFiniteDifferenceDerivatives. This is questionable, I
> know, but the simpler the algorithm, the simpler their use, their
> reading and debugging (that way, you can get rid of the f_and_df
> function as well or at least of the test).
I still think the approach is incorrect, user didn't ought to supply
gradient, we should calculate it by ourselves if it's absent. At least
any known to me optimization software do the trick. As for
helpers.ForwardFiniteDifferenceDerivatives, it will take too much time
for user to dig into documentation to find the one.
Also, as you see my f_and_df is optimized to not recalculate f(x0) while
gradient obtaining numerically, like some do, for example approx_fprime
in scipy.optimize. For problems with costly funcs and small nVars (1..5)
speedup can be significant.
Of course, it should be placed in single file for whole "optimizers"
package, like I do in my ObjFunRelated.py, not in qubic_interpolation.py.
But it would be better would you chose the most appropriate place (and /
or informed me where is it).
More information about the Scipy-dev