Improving fit of data to sum of cosines model

10 views (last 30 days)
I am trying to use fitnlm to fit a potential energy versus dihedral rotation angle dataset. The fit is rather poor. My guess is that this is due to the solver thinking that it is close to a solution due to the very small SSE when really it is quite far from a solution because the peaks in the data are so miniscule. I have tried specifying TolFun and TolX via statset as options for fitnlm, but the fit remains poor as shown below.
>> opts = statset('Display','iter','TolFun',1e-20,'TolX',1e-20);
>> mdl = fitnlm(dihedral,energy2,expr,ks,'options',opts)
Norm of Norm of
Iteration SSE Gradient Step
0 2.37101e+07
1 18739 539.542 737.749
2 3.1713 5.35842 132.581
3 0.000750361 0.00704237 1.77563
4 0.000744824 8.02138e-07 0.00234913
5 0.000744824 3.01561e-06 7.92359e-07
6 0.000744824 5.24565e-07 1.49189e-07
7 0.000744824 1.03256e-07 5.7797e-07
8 0.000744824 1.20236e-06 5.45475e-09
9 0.000744824 1.57206e-06 9.88352e-12
10 0.000744824 1.11136e-06 1.39401e-14
Iterations terminated: relative change in SSE less than OPTIONS.TolFun
mdl =
Nonlinear regression model:
y ~ ks1 + (1/2)*(ks2*(1 + cos(dihedral)) + (1/2)*ks3*(1 - cos(2*dihedral)) + (1/2)*ks4*(1 + cos(3*dihedral)) + (1/2)*ks5*(1 - cos(4*dihedral)))
Estimated Coefficients:
Estimate SE tStat pValue
___________ _________ ___________ ___________
ks1 -800.51 0.0025226 -3.1733e+05 1.5132e-153
ks2 0.0001656 0.0023234 0.071275 0.94362
ks3 0.00074661 0.004499 0.16595 0.86924
ks4 -0.00094231 0.0045123 -0.20883 0.8359
ks5 -0.00069346 0.0045822 -0.15134 0.88066
Number of observations: 37, Error degrees of freedom: 32
Root Mean Squared Error: 0.00482
R-Squared: 0.00333, Adjusted R-Squared -0.121
F-statistic vs. constant model: 0.0268, p-value = 0.999
Any help in how I can refine this fit would be appreciated.
The attached workspace contains the dihedral vs energy data (dihedral = X, energy2 = Y), the expression to be fit to (expr), and the starting guess vector needed for fitnlm (i.e. beta0 = ks). An example of a failed model (mdl) and options for fitnlm (opts) is also included.
I have an answer below, but I would still like to learn what I'm doing wrong with fitnlm -- if anybody has insights, would be appreciated.

Accepted Answer

Mathieu NOE
Mathieu NOE on 25 Aug 2022
some findings that may interest you ...
as I don't have the Curve Fitting Toolbox , this work has been done with the poor's man solution (with fminsearch)
so my findings :
  1. the x data needed to be shifted so that the y minimum is relocated at x = 0 , otherwise the model provided cannot give any good fit. Simply look at the formula : the first (and dominant) harmonic is (1/2)*(ks2*(1+cos(x)) , so to have y = ks1 + (1/2)*(ks2*(1+cos(2*pi*f*x)) minimal (upper harmonics neglected here) , it requires ks2 to be negative and x be 0.
  2. the "frequency" in the cos(x) terms is incorrect as the data show a periodicity of 180 (degrees ?) so we have to modify the model and introduce that angular period : therefore cos(x) is modified into cos(2*pi*f*x) and idem for the upper harmonics
  3. at the end we can get a reasonnable good fit as shown below
  4. you can probably use these info's for your own code
x = dihedral;
y = energy2;
%% important !!
x = x - 42; % needed to shift x data to center the y minimum at x = 0 and make the fitting work !
%% curve fit using fminsearch
f_est = 1/180; % angular frequency
[y_fit,sol] = myfit(x,y,f_est); % NB : sol contains best estimates of ks1,ks2,ks3,ks4,ks5,f
Rsquared = my_Rsquared_coeff(y,y_fit); % correlation coefficient
plot(x,y, 'b .-',x, y_fit, 'r-', 'MarkerSize', 15)
title(['Model Fit : R² = ' num2str(Rsquared)], 'FontSize', 15)
xlabel('dihedral', 'FontSize', 14)
ylabel('energy²', 'FontSize', 14)
function [y_fit,sol] = myfit(x,y,f_est)
% Nonlinear regression model:
% y = ks1 + (1/2)*(ks2*(1 + cos(dihedral)) + (1/2)*ks3*(1 - cos(2*dihedral)) + (1/2)*ks4*(1 + cos(3*dihedral)) + (1/2)*ks5*(1 - cos(4*dihedral)))
func = @(ks1,ks2,ks3,ks4,ks5,f,x) ks1 + (1/2)*(ks2*(1+cos(2*pi*f*x)) + (1/2)*ks3*(1-cos(2*pi*f*2*x)) + (1/2)*ks4*(1+cos(2*pi*f*3*x)) + (1/2)*ks5*(1-cos(2*pi*f*4*x)));
obj_fun = @(params) norm(func(params(1), params(2), params(3), params(4), params(5), params(6),x)-y);
sol = fminsearch(obj_fun, [mean(y),0,0,0,0,f_est]); % to be updatd
a_sol = sol(1);
b_sol = sol(2);
c_sol = sol(3);
d_sol = sol(4);
e_sol = sol(5);
f_sol = sol(6);
y_fit = func(a_sol, b_sol, c_sol, d_sol, e_sol, f_sol,x);
function Rsquared = my_Rsquared_coeff(data,data_fit)
% R2 correlation coefficient computation
% The total sum of squares
sum_of_squares = sum((data-mean(data)).^2);
% The sum of squares of residuals, also called the residual sum of squares:
sum_of_squares_of_residuals = sum((data-data_fit).^2);
% definition of the coefficient of correlation is
Rsquared = 1 - sum_of_squares_of_residuals/sum_of_squares;
  1 Comment
Marcus on 26 Aug 2022
Thanks for your help on this. The data should have already been symmetric about 0 -- I'm not sure what I did when I made the array. Your approach definitely works (thanks!), although I'm still curious how to make it work with fitnlm if anyone else wants to post an answer addressing the problem that way.

Sign in to comment.

More Answers (1)

Alan Stevens
Alan Stevens on 25 Aug 2022
Try changing the dihedral angles to radians (or change cos to cosd in expr).
Alan Stevens
Alan Stevens on 27 Aug 2022
No, I'm afraid I don't have the Staistics and machine learning toolbox.

Sign in to comment.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!