fminunc Converging at a strange point

2 views (last 30 days)
I compared fminunc(least square fitting) with polyfit, but the result was not coincide:
here is my code:
% clear session, close plots, clear screen
clear all; close all; clc
% data for regression
xm = linspace(0,4*pi,10);
ym = sin(xm);
% initial parameter guess
p0 = rand(1,8);
% define objective function (scaled sum of squared errors)
objective = @(p) sum((yp(p)-ym).^2);
disp(['Initial Objective: ' num2str(objective(p0))])
% optimize with fmincon
%[X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN]
options = optimoptions(@fminunc,'Display','iter','Algorithm','quasi-newton');
[popt,fval,exitflag,output,grad,hessian]=fminunc(objective,p0,options);
%
plot(xm,ym,'ro')
hold on
plot(xm,yp(popt),'gs')
% compare with polyfit
p = polyfit(xm,ym,7);
y1 = polyval(p,xm);
plot(xm,y1,'o')
legend('measured','fminunc','polyfit')
ylabel('y')
xlabel('x')
yp(popt)
y1
function cc = yp(p)
xm = linspace(0,4*pi,10);
cc=p(8);
for i=1:7
cc = cc+p(i).*xm.^i;
end
end
I check the gradient of the function and it is far from reaching the condition of convergence.
I don't know why it converges to such a strange solution, could someone explain or help to adjust the parameters?
  3 Comments
Torsten
Torsten on 20 Aug 2022
Edited: Torsten on 20 Aug 2022
Yes, (4*pi)^7 is large. Determine the coefficients for the normalized polynomial as "polyfit" does (i.e. use centering and scaling). Look up the polyfit documentation.
Bruno Luong
Bruno Luong on 20 Aug 2022
@Torsten I don't believe polyfit with single output center and normalize the data.

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 21 Aug 2022
@zhou caiwei "1. MATLAB has a variety of solutions, if one of them not work, try others until the prolem is solved;
2.MATLAB is complex,details make a difference"
I think you miss the most important lesson:
3. Makesure your parameter scales are comparable when working in numerical minimization (and linear system as well):
% clear session, close plots, clear screen
clear all; close all; clc
% data for regression
xm = linspace(0,4*pi,10);
ym = sin(xm);
% initial parameter guess
p0 = rand(1,8);
% define objective function (scaled sum of squared errors)
[xmax,xmin] = bounds(xm);
xmedian = (xmax+xmin)/2;
xdiff = (xmax-xmin)/2;
xn = (xm-xmedian)/xdiff;
objective = @(p) sum((ypn(p,xn)-ym).^2);
disp(['Initial Objective: ' num2str(objective(p0))])
Initial Objective: 24.1102
% optimize with fmincon
%[X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN]
options = optimoptions(@fminunc,'Display','iter','Algorithm','quasi-newton');
[pnopt,fval,exitflag,output,grad,hessian]=fminunc(objective,p0,options);
First-order Iteration Func-count f(x) Step-size optimality 0 9 24.1102 21.9 1 18 5.93391 0.0457562 8.36 2 27 4.35571 1 1.29 3 36 4.24105 1 0.663 4 45 4.11615 1 0.716 5 54 4.0728 1 0.639 6 63 4.04518 1 0.441 7 72 4.01684 1 0.429 8 81 3.93772 1 0.748 9 90 3.75595 1 1.36 10 99 3.33261 1 2.21 11 108 2.55486 1 2.95 12 117 1.60069 1 2.82 13 126 1.02686 1 1.63 14 135 0.882778 1 0.656 15 144 0.853617 1 0.411 16 153 0.824985 1 0.447 17 162 0.778002 1 0.383 18 171 0.739721 1 0.244 19 180 0.725867 1 0.101 First-order Iteration Func-count f(x) Step-size optimality 20 189 0.72418 1 0.0348 21 198 0.723996 1 0.0408 22 207 0.723747 1 0.0446 23 216 0.723067 1 0.0489 24 225 0.721679 1 0.0746 25 234 0.719377 1 0.0918 26 243 0.717205 1 0.0686 27 252 0.716303 1 0.0239 28 261 0.716157 1 0.0168 29 270 0.716126 1 0.0164 30 279 0.716064 1 0.016 31 288 0.715905 1 0.0173 32 297 0.715486 1 0.0316 33 306 0.714393 1 0.0547 34 315 0.71154 1 0.0919 35 324 0.704142 1 0.151 36 333 0.68524 1 0.243 37 342 0.638783 1 0.376 38 351 0.53513 1 0.536 39 360 0.35079 1 0.628 First-order Iteration Func-count f(x) Step-size optimality 40 369 0.145003 1 0.493 41 378 0.0395369 1 0.194 42 387 0.0205123 1 0.0256 43 396 0.0194762 1 0.0072 44 405 0.0194577 1 0.00729 45 414 0.019454 1 0.00738 46 423 0.0194275 1 0.00779 47 432 0.0193745 1 0.00827 48 441 0.0192193 1 0.00911 49 450 0.0188366 1 0.0128 50 459 0.0178676 1 0.0211 51 468 0.0156532 1 0.0317 52 477 0.0114315 1 0.0401 53 486 0.0060754 1 0.0357 54 495 0.00272485 1 0.017 55 504 0.00194909 1 0.00308 56 513 0.00189318 1 5.29e-05 57 522 0.00189209 1 3.02e-05 58 540 0.00189209 0.20013 2.45e-05 Computing finite-difference Hessian using objective function. Local minimum possible. fminunc stopped because it cannot decrease the objective function along the current search direction.
%
plot(xm,ym,'g-')
hold on
plot(xm,ypn(pnopt,xn),'ro')
% compare with polyfit
p = polyfit(xm,ym,7);
y1 = polyval(p,xm);
plot(xm,y1,'+b')
legend('measured','fminunc','polyfit')
ylabel('y')
xlabel('x')
ypn(pnopt,xn)
ans = 1×10
0.0002 0.9830 0.3491 -0.8826 -0.6179 0.6179 0.8826 -0.3491 -0.9831 -0.0002
y1
y1 = 1×10
0.0002 0.9830 0.3491 -0.8826 -0.6179 0.6179 0.8826 -0.3491 -0.9830 -0.0002
function cc = ypn(pn,xn)
cc=pn(8);
for i=1:7
cc = cc+pn(i).*xn.^(i);
end
end
  2 Comments
Bruno Luong
Bruno Luong on 22 Aug 2022
Edited: Bruno Luong on 22 Aug 2022
Note that a polynomial function on xn is still a polynomial in xm, since xn is affine trasformation from xm.
The coefficients of optimal xm polynomial can be derived from pnopt returned by fminunc.

Sign in to comment.

More Answers (3)

John D'Errico
John D'Errico on 20 Aug 2022
Edited: John D'Errico on 20 Aug 2022
This is a common mistake made by people. x varies over a moderately large range. That means the POWERS of x will get very large. And in turn, that means the coefficients of those terms will need to have a large range of values in magnitude. And in turn, that helps to make this a poorly posed problem. Both fminunc and polyfit will have serious numerical problems, despite this being a simple problem, in theory. Remember that you are using only double precision arithmetic.
So what are you trying to do?
xm = linspace(0,4*pi,10);
ym = sin(xm);
Ok, a REALLY bad idea is to try to fit a model with 8 unknown coefficients, using ONLY 10 data points. That is just silly.
xm = linspace(0,4*pi,100);
ym = sin(xm);
So xm grows as large as 4*pi. The series itself SHOULD be:
syms X
Taylor_sine = taylor(sin(X),'order',8)
Taylor_sine = 
As you can see though, if x is on the order of
4*pi
ans = 12.5664
This will be all computable in double precision arithmetic, although you are getting close to the edge. We can tell that by looking at the condition number for the asociated Vandermonde matrix.
cond(xm'.^(0:7))
ans = 3.9432e+08
So the solution would completely fail in SINGLE precision arithmetic, but it will survive in double precision arithmetic. Luckily, your data has no noise in it at all.
So what is the real reason why things seem to be performing poorly?
First, see how polyfit does:
format long g
poly7 = polyfit(xm,ym,7)
poly7 = 1×8
-5.57605384659417e-05 0.00245247657206736 -0.0411779076985449 0.325443180389698 -1.18418151628107 1.50773833414353 0.126075267256452 0.101900826945285
fplot(@(x) polyval(poly7,x),'r',[0,4*pi'])
hold on
fplot(@sin,'b',[0,4*pi])
legend('Polyfit approximation to sin(x)','sin(x)')
hold off
So the fit itself looks to be not obscenely terrible, yet I see these coefficients are nothing close to what I would expect for a true sine series. (I'm actually surprised the fit is as close as it came.) There are significant areas of lack of fit though. (Look at what happens near 0 and 4*pi, as well as how the fit misses the peaks and valleys.)
Next, remember that in order to get any true convergence for a SINE series, going out as far as x=12, you would need many more terms than just an x^7 term. And that means your polynomial must be significantly high order than only a 7th degree polynomial. But if you were to use a higher order polynomial than that, then even double precision arithmetic will fail you. Just for kicks, lets look at what the true truncated Taylor series produces for that function over that domain.
fplot(Taylor_sine,'r',[0,4*pi])
hold on
fplot(@sin,'b',[0,4*pi])
legend('Taylor series approximation to sin(x)','sin(x)')
hold off
Do you see the problem? You cannot use such a low order polynomial to well approximate the sine function over that domain. It is not even close to convergent, beyond roughly 2*pi, but you are going out as far as 4*pi. Had I limited the domain to even 2*pi, the approximation would still be seen to have numerical problems.
Fminunc will of course have serious numerical problems too, but I should be able to stop here, because fminunc is a completely different algorithm, and it will have its own completely different problems. You should expect numerical issues no matter what you do in this problem. If you do get a fit, it will look nothing like what you would expect the truncated sine series coefficients actually should be. If you use a higher order model, then the fit will be complete crap.
Added:
Fminunc. What happens? First, if you use only 10 data points, expect it to be complete crap. And, you used random start point for the initial value. SIGH. Garbage in, garbage out.
WHen you start with a REALLY bad inital value for the coefficients, then the optimizer see large residuals to sttart with. This helps the optimizer to get lost, diverging. For example, first with only the 10 points you had:
xm = linspace(0,4*pi,10);
ym = sin(xm);
format short
p0 = rand(1,8)
p0 = 1×8
0.0038 0.5609 0.2456 0.7997 0.1813 0.6704 0.9381 0.0775
These are terrible start points. RANDOM START POINTS ARE OFTEN A BAD IDEA.
% far simpler than using a function that you wrote. LEARN TO USE EXISTING CODE!!!!
yp = @(P) polyval(P,xm);
Before we do a fit, look at what yp returns:
yp(p0)
ans = 1×10
1.0e+06 * 0.0000 0.0000 0.0004 0.0037 0.0198 0.0741 0.2190 0.5504 1.2265 2.4921
Do you see the initial polynomial is complete CRAPOLA? Any random polynomial will be that. Remember what I said. You would expect this to be true. These are TERRIBLY poor starting values for a polynomial fit. So the solver can get lost.
% define objective function (scaled sum of squared errors)
objective = @(p) sum((yp(p)-ym).^2);
disp(['Initial Objective: ' num2str(objective(p0))])
Initial Objective: 8071607107421.342
Do you see that the initial objective function is HUGE at the start point?
You have too little data. You have terrible start points. Any optimizer will have problems.
% optimize with fmincon
%[X,FVAL,EXITFLAG,OUTPUT,LAMBDA,GRAD,HESSIAN]
options = optimoptions(@fminunc,'Display','iter','Algorithm','quasi-newton');
[popt,fval,exitflag,output,grad,hessian]=fminunc(objective,p0,options);
First-order Iteration Func-count f(x) Step-size optimality 0 9 8.07161e+12 3.11e+14 1 36 3.29679e+10 1.65488e-16 1.05e+11 2 81 2.038e+10 7381 3.12e+11 3 90 2.65269e+06 1 8.34e+09 4 99 2.64701e+06 1 4.62e+07 Computing finite-difference Hessian using objective function. Local minimum found. Optimization completed because the size of the gradient is less than the value of the optimality tolerance.
format short g
popt
popt = 1×8
0.0023033 -0.044804 0.1362 0.78465 0.17943 0.67017 0.93804 0.077495
fval
fval =
2.647e+06
So even at the final point, the objective is relatively huge. Remember that fminunc is NOT a least squares solver, but a general optimizer. So that too is an issue.
Perhaps a better choice of starting values would simply be:
Myp0 = 1./factorial(7:-1:0).*repmat([1 .01],1,4)
Myp0 = 1×8
0.00019841 1.3889e-05 0.0083333 0.00041667 0.16667 0.005 1 0.01
yp(Myp0)
ans = 1×10
1.0e+00 * 0.01 1.9177 8.1808 31.975 114.53 365.12 1032.1 2615.6 6031.7 12839
These are at least a little smaller numbers, so the solver may have an easier time of it. But what are the default values for the tolerance? TolFun for fminunc is 1e-6. So once the objective function has been reduced from 1e15 to 1e6, and the gradient is now relatively small, fminunc thinks it is done.
options.TolFun = 1e-13; % since we are so far away initially, we will need a massive reduction.
[popt,fval,exitflag,output,grad,hessian]=fminunc(objective,Myp0,options);
First-order Iteration Func-count f(x) Step-size optimality 0 9 2.09284e+08 1.58e+12 1 54 369657 1.65485e-16 3.47e+08 2 72 369479 10 5.61e+08 Computing finite-difference Hessian using objective function. Local minimum possible. fminunc stopped because the size of the current step is less than the value of the step size tolerance.
popt
popt = 1×8
-6.3503e-05 -8.185e-06 0.0083315 0.00041651 0.16667 0.005 1 0.01
fval
fval =
3.6948e+05
Still crap. But this may just be getting at the idea that fminunc is not designed to solve that class of problem, at least not without problems of its own.
fplot(@(x) polyval(popt,x),'r',[0,4*pi'])
hold on
fplot(@sin,'b',[0,4*pi])
plot(xm,ym,'go')
legend('Fminunc approximation to sin(x)','sin(x)','data')
hold off
It fit the first couple of points ok, but then things got out of hand. Poor starting values are a problem, and mine were not much better. It looks like the solver starts to effectively chase the last data point, because that is the most important with the largest residual from the starting values. But then it gets lost.
  5 Comments
zhou caiwei
zhou caiwei on 21 Aug 2022
@John D'Errico @Torsten really knowledgeable answer,I need to go back and review numerical analysis and numerical optimization.

Sign in to comment.


Torsten
Torsten on 20 Aug 2022
Edited: Torsten on 20 Aug 2022
xm = linspace(0,4*pi,10).';
ym = sin(xm);
% initial parameter guess
p0 = rand(1,8);
% define objective function (scaled sum of squared errors)
yp = @(p) sum(p.*xm.^(0:7),2);
objective = @(p) yp(p)-ym;
options = optimset('TolFun',1e-8,'TolX',1e-8);
[p1,res] = lsqnonlin(objective,p0,[],[],options)
Local minimum possible. lsqnonlin stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.
p1 = 1×8
0.0002 -0.1141 1.9084 -1.3808 0.3702 -0.0464 0.0028 -0.0001
res = 0.0019
y1 = yp(p1);
p2 = polyfit(xm,ym,7);
p2flip = fliplr(p2)
p2flip = 1×8
0.0002 -0.1141 1.9084 -1.3808 0.3702 -0.0464 0.0028 -0.0001
y2 = polyval(p2,xm);
plot(xm,ym)
hold on
plot(xm,y1)
hold on
plot(xm,y2)

zhou caiwei
zhou caiwei on 21 Aug 2022
Wow,MATLAB is really broad and profound. Thank you all for your inspiring discussions. Maybe polyfit works because it use Legendre expansion hah. I draw two conclusions from your answer:1. MATLAB has a variety of solutions, if one of them not work, try others until the problem is solved;2.MATLAB is complex,details make a difference

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!