Why trust-region-reflective algorithm in fmincon require a gradient as input?

Hello
According to Matlab fmincon documentation, the 'trust-region-reflective' algorithm needs to specify the objective gradient.
1- Why does it need gradient as input?
2- Does the objective function have to be analytical? since my objective function, is function handle that calculates some scalar as output
Thanks for your help in advance

12 Comments

The objective function has to be differentiable and return a scalar value.
This is senseful because how do you want to minimize a matrix ?
Thank you @Torsten forr your response.
I calculated the average of my matrix and the output is scalar now (I have edited it the main question as well).
But, my question is about the reason that we must provide gradient as input if we use trust-region-reflective' algorithm,
while for other algorithms such as SQP it is not needed? it exists in the fmincon documentation page.
I qoute:
The trust-region-reflective algorithm requires:
  • A gradient to be supplied in the objective function
  • SpecifyObjectiveGradient to be set to true
  • Either bound constraints or linear equality constraints, but not both
If you select the 'trust-region-reflective' algorithm and these conditions are not all satisfied, fmincon throws an error.
Thanks
My guess is that for the trust-region-reflective algorithm it is very important that the gradient of the objective function is calculated with high precision. That's why the user is requested to supply it (at best analytically).
Of course, all the other methods also need the gradient of the objective, but it will be calculated within fmincon by numerical differentiation.
But why care about it - you won't be able to alter the requirements MATLAB imposes.
I was under the impression that, mathematically, it is impossible to use the trust-region-reflective (TRA) algorithm without having the gradients of the objective function.
I want to know if there is a mathematical obstacle, numerical issue, or convergence issue? Since other methods do not mandate to provide gradient as input, while for TRA, it is compulsory to provide the gradient; otherwise, the "fmincon" will not work.
Having this, to my understanding, TRA can only be used in "fmincon" if the objective function is available analytically, that you can calculate the gradient of it, otherwise it is impossible to use TRA as an optimisation algorithm.
I want to know the logic behind the fact that the Matlab has made gradient mandatory for the TRA algorithm?
I care about it since I intend to use TRA as the algorithm, but I do not have the analytical function. Thus, it seems that I cannot use it.
Thanks
I tried to explain the logic behind it. Please read my response more carefully.
The essence is that all algorithms need the gradient of the objective, but that it is vital for the trust-region-reflective algorithm that the gradient is supplied with high precision. That's why the user is explicitly requested to supply it, at best analytically.
Of course you can use TRA also in case you have no analytical derivative of the objective, namely by calculating the gradient of the objective numerically, but I don't know if this is precise enough for the TRA to succeed. Especially because you said that you do not even know if your objective is differentiable. If not, better use ga instead of fmincon.
Sorry, I thought since you mentioned " I guess", you are not sure and assuming it, my mistake.
Would you please advise me if there is a way that when you do not have an analytical function, use TRA algorithm, and provide gradient,
Since my problem is:
I have 120 variables (x) and trying to minimise my objective function (f). The objective function is calculated numerically at each iteration, and I cannot calculate its gradient since there is no explicit formula.
currently, I get some results via "SQP' and "interior-point" algorithms, but I want to try the TRA algorithm,
Do you know how I can use TRA when the function handle is not analytical?
Thank you
(f(x(1),x(2),...,x(i )+ h,x(i+1),...,x(120))-f(x(1),x(2),....,x(120)))/h
gives you an approximation to the i-th element of the gradient of the objective function f where h is a small number (e.g.1e-8).
There are several MATLAB codes in the File Exchange that calculate numerical gradients of a scalar function (I think Matt J gave a link to one of these codes).
But when using fmincon, make sure that your objective and your constraints are really differentiable with respect to the variables. If not, better use ga.
Thank you @Torsten, For the answers.
I look into this approximation.
I have developed a two-step optimisation, I first calculate the optimisation using "ga" and the output of "ga" I use for "fmincon" to get the derivatives at the optimum in order to evaluate the optimum condition.
I do not see anything in there that requires analytic solutions, but I might have missed something.
"I have 120 variables (x) and trying to minimise my objective function (f). The objective function is calculated numerically at each iteration, and I cannot calculate its gradient since there is no explicit formula."
This statement is simply not true. I have minimizes a function with 1e6 variables with no explicit-formula (solving a PDE) and I can compute the exact gradient by chain rules and adjoint equation. The only requirent is your code that compute the objective function only uses diffentiable functions and operators (so no min, max, if) within it.
I could imagine situations such as the function involving solving integral equations where no closed form formula exists. For example ODE often have a general solution involving the exp() of the sum of the roots of an integral equation.

Sign in to comment.

 Accepted Answer

Does the objective function have to be analytical? since my objective function, is function handle that calculates some scalar as output
I'm not sure what difference you see between a function being "analytical" and "specified by a function handle". If you mean, does fmincon require the objective function be specified as a single-line mathematical formula, the answer is no.
Why does it need gradient as input?
I believe it is because the trust-region algorithm is intended for problems of large dimension. If so, it would create a pitfall for users if finite difference approximations to the gradient were used by default, since for high dimensional problems, that would be very expensive computationally.
Regardless, even though the trust-region algorithm requires you to supply your own gradient calculation, you are of course free to do that calculation with your own finite difference implementation, or with one from the File Exchange (Example). However, as Torsten says, it might also be that the algorithm is sensitive to errors in the derivative calculations, so I'm not sure how well that would work.

1 Comment

Thank you, @Matt J for the clarification. I understand that there is no mathematical obstacle, just programming considerations.

Sign in to comment.

More Answers (2)

What you @SM are looking for is called Derivative-Free Optimization methods. They do not require derivatives or analytical formulations of the objective/constraint function. You may have a look at PDFO (Powell's Derivative-Free Optimization solvers):
I don't think Matt's answer is correct, rather Torsen answer is better. The Trust region algorithm must approximate the objective function in a current "trust region". It is usually use the cost function and gradient in many points and create a quadratic function approximating. That requires both cost function and gradient calculation are continuous with respect to the points at which the gradients (and the objectiive) are evaluated. Therefore the numerical finite differences with adaptive step is not suitable, since there is no known method to estimate the step continuously with the point.
Therefore TR algorithm requires user to supply the "exact" gradient. Otherwise it won't converge.
Better still, if user provides the Hessian x vector calculation, it will help the TR to work even more efficiently.

5 Comments

I don't follow that reasoning. By default, the TR method computes the Hessian using finite differences. I don't see why TR would be tolerant to finite difference approximations to the Hessian, but not the gradient.
They use (L)-BFGS formula to approximate the Hessian from the gradients not finite difference. The difference with other method is that TR have to solve subproblem using the quadratic model (thus the Hessain) and not the descend direction (1D) which is more or less d:=-H*g in many cases. Therefore the requirement of gradient being smooth is more important.
They use (L)-BFGS formula to approximate the Hessian from the gradients not finite difference.
You're thinking of the interior-point method. For TR, the documentation says,
OK, I miss that.
But my point does not change, if they compute H by finite differences on g, it requres the g to be exact so as H is meaningfuly estimatedl. Bottom line is the TR requires more accuracy on g, not because of the computation time.
Interesting to follow you discussion. We have a recent application of TR to energy minimizations ( link ) using the finite element method. It converges reasonably even with the approximated gradient.

Sign in to comment.

Products

Release

R2021a

Asked:

SM
on 25 Feb 2022

Commented:

Jan
on 27 Jun 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!