[SciPy-user] optimization using fmin_bfgs with gradient information
Sebastian Walter
sebastian.walter@gmail....
Thu Jul 23 03:42:21 CDT 2009
1)
approx_fprime uses probably finite differences to approximate the
gradient, so at x = 2
it computes (f(2+epsilon) - f(2))/epsilon = (2+epsilon)**2/epsilon
which can be very large
2) When using AD to evaluate derivatives, only one path through the
control flow graph (which is defined by the if statement in your
function) is taken.
I.e. if x<2, AD will not know that for x>=2 it would have taken
another way through the control flow graph of your algorithm.
I.e. it would compute
f'(x<2) = 0 and f'(x>=2) = x**2 without realizing that the function
is nondifferentiable at that point.
3)
If you have such an objective function as you have shown above, then
your optimization problem is not well-formed because it does not have
a real minimizer x_*.
I'd assume that your objective function looks more like
f(x) = abs(x)
which is non-differentiable at x=0.
This kind of objective function destroys the convergence properties of
algorithms that assume continuously differentiable objective
functions, i.e. the convergence to the minimizer can be very slow.
For such problems, special purpose algorithms, e.g. the so-called
"bundle methods" can be used which converge superlinearly, as far as I
know.
Sebastian
2009/7/22 Ernest Adrogué <eadrogue@gmx.net>:
> 20/07/09 @ 10:43 (+0200), thus spake Sebastian Walter:
>> thanks for the function :).
>> It is now part of pyadolc's unit test.
>> http://github.com/b45ch1/pyadolc/blob/3cfff80c062d43e812379f4606eda0ebaaf4e82c/tests/complicated_tests.py
>> , Line 246
>>
>> I added two functions: one is providing the gradient by finite
>> differences and the other by using automatic differentiation.
>> The finite differences gradient has very poor accuracy, only the first
>> three digits are correct.
>
> So what your automatic differentiation does when it reaches
> a point where the function is not differentiable?
>
> Scipy's optimization.approx_fprime() returns an arbitrarily
> large value. For example, let's suppose that f(x) is
>
> In [164]: def f(x):
> if x[0] > 2:
> return x[0]**2
> return 0
>
> then the derivative of f(2) doesn't exist mathematically
> speaking. f'(x<2) is 0, and f'(x>2) is 2*x, if I understand
> correctly. This is the output of approx_fprime for function f:
>
> In [162]: [opt.approx_fprime((i,),f,eps) for i in numpy.linspace(1.95,2.05,9)]
> Out[162]:
> [array([ 0.]),
> array([ 0.]),
> array([ 0.]),
> array([ 0.]),
> array([ 2.68435460e+08]),
> array([ 4.02500004]),
> array([ 4.05000001]),
> array([ 4.07500005]),
> array([ 4.09999996])]
>
> As you can see, it returns a large number for f'(2).
> My question is, for the purposes of optimising f(x), what
> should my gradient function return at x=2, so that the
> optimisation algorithm works well. I would have said it should
> return 0, but seeing what approx_fprime does, I'm not sure any more.
>
> Ernest
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
More information about the SciPy-User
mailing list