Improving fit of data to sum of cosines model

Question

Marcus on 25 Aug 2022

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1785650-improving-fit-of-data-to-sum-of-cosines-model

Commented: Alan Stevens on 27 Aug 2022

Workspace_dihedral_vs_energy_fitting.mat

I am trying to use fitnlm to fit a potential energy versus dihedral rotation angle dataset. The fit is rather poor. My guess is that this is due to the solver thinking that it is close to a solution due to the very small SSE when really it is quite far from a solution because the peaks in the data are so miniscule. I have tried specifying TolFun and TolX via statset as options for fitnlm, but the fit remains poor as shown below.

>> opts = statset('Display','iter','TolFun',1e-20,'TolX',1e-20);
>> mdl = fitnlm(dihedral,energy2,expr,ks,'options',opts)
 
                                     Norm of         Norm of
   Iteration             SSE        Gradient           Step 
  -----------------------------------------------------------
           0     2.37101e+07
           1           18739         539.542         737.749
           2          3.1713         5.35842         132.581
           3     0.000750361      0.00704237         1.77563
           4     0.000744824     8.02138e-07      0.00234913
           5     0.000744824     3.01561e-06     7.92359e-07
           6     0.000744824     5.24565e-07     1.49189e-07
           7     0.000744824     1.03256e-07      5.7797e-07
           8     0.000744824     1.20236e-06     5.45475e-09
           9     0.000744824     1.57206e-06     9.88352e-12
          10     0.000744824     1.11136e-06     1.39401e-14
Iterations terminated: relative change in SSE less than OPTIONS.TolFun
mdl = 
Nonlinear regression model:
    y ~ ks1 + (1/2)*(ks2*(1 + cos(dihedral)) + (1/2)*ks3*(1 - cos(2*dihedral)) + (1/2)*ks4*(1 + cos(3*dihedral)) + (1/2)*ks5*(1 - cos(4*dihedral)))
Estimated Coefficients:
            Estimate         SE           tStat         pValue   
           ___________    _________    ___________    ___________
    ks1        -800.51    0.0025226    -3.1733e+05    1.5132e-153
    ks2      0.0001656    0.0023234       0.071275        0.94362
    ks3     0.00074661     0.004499        0.16595        0.86924
    ks4    -0.00094231    0.0045123       -0.20883         0.8359
    ks5    -0.00069346    0.0045822       -0.15134        0.88066
Number of observations: 37, Error degrees of freedom: 32
Root Mean Squared Error: 0.00482
R-Squared: 0.00333,  Adjusted R-Squared -0.121
F-statistic vs. constant model: 0.0268, p-value = 0.999

Any help in how I can refine this fit would be appreciated.

The attached workspace contains the dihedral vs energy data (dihedral = X, energy2 = Y), the expression to be fit to (expr), and the starting guess vector needed for fitnlm (i.e. beta0 = ks). An example of a failed model (mdl) and options for fitnlm (opts) is also included.

Update

I have an answer below, but I would still like to learn what I'm doing wrong with fitnlm -- if anybody has insights, would be appreciated.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Mathieu NOE on 25 Aug 2022

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1785650-improving-fit-of-data-to-sum-of-cosines-model#answer_1033150

Open in MATLAB Online

hello

some findings that may interest you ...

as I don't have the Curve Fitting Toolbox , this work has been done with the poor's man solution (with fminsearch)

so my findings :

the x data needed to be shifted so that the y minimum is relocated at x = 0 , otherwise the model provided cannot give any good fit. Simply look at the formula : the first (and dominant) harmonic is (1/2)*(ks2*(1+cos(x)) , so to have y = ks1 + (1/2)*(ks2*(1+cos(2*pi*f*x)) minimal (upper harmonics neglected here) , it requires ks2 to be negative and x be 0.
the "frequency" in the cos(x) terms is incorrect as the data show a periodicity of 180 (degrees ?) so we have to modify the model and introduce that angular period : therefore cos(x) is modified into cos(2*pi*f*x) and idem for the upper harmonics
at the end we can get a reasonnable good fit as shown below
you can probably use these info's for your own code

load('Workspace_dihedral_vs_energy_fitting.mat');
x = dihedral;
y = energy2;
%% important !!
x = x - 42; % needed to shift x data to center the y minimum at x = 0 and make the fitting work ! 
    
    %% curve fit using fminsearch
    f_est = 1/180; % angular frequency
    
    [y_fit,sol] = myfit(x,y,f_est);    % NB : sol contains best estimates of ks1,ks2,ks3,ks4,ks5,f
    
    Rsquared = my_Rsquared_coeff(y,y_fit); % correlation coefficient
    figure(1)
    plot(x,y, 'b .-',x, y_fit, 'r-', 'MarkerSize', 15)
    title(['Model Fit  : R² = ' num2str(Rsquared)], 'FontSize', 15)
    xlabel('dihedral', 'FontSize', 14)
    ylabel('energy²', 'FontSize', 14)
   
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function     [y_fit,sol] = myfit(x,y,f_est)
    
 
%     Nonlinear regression model:
%     y = ks1 + (1/2)*(ks2*(1 + cos(dihedral)) + (1/2)*ks3*(1 - cos(2*dihedral)) + (1/2)*ks4*(1 + cos(3*dihedral)) + (1/2)*ks5*(1 - cos(4*dihedral)))
    func = @(ks1,ks2,ks3,ks4,ks5,f,x) ks1 + (1/2)*(ks2*(1+cos(2*pi*f*x)) + (1/2)*ks3*(1-cos(2*pi*f*2*x)) + (1/2)*ks4*(1+cos(2*pi*f*3*x)) + (1/2)*ks5*(1-cos(2*pi*f*4*x)));
    obj_fun = @(params) norm(func(params(1), params(2), params(3), params(4), params(5), params(6),x)-y);
    sol = fminsearch(obj_fun, [mean(y),0,0,0,0,f_est]); % to be updatd
    a_sol = sol(1);
    b_sol = sol(2);
    c_sol = sol(3);
    d_sol = sol(4);
    e_sol = sol(5);
    f_sol = sol(6);
    y_fit = func(a_sol, b_sol, c_sol, d_sol, e_sol, f_sol,x);
    
end
function Rsquared = my_Rsquared_coeff(data,data_fit)
    % R2 correlation coefficient computation
    
    % The total sum of squares
    sum_of_squares = sum((data-mean(data)).^2);
    
    % The sum of squares of residuals, also called the residual sum of squares:
    sum_of_squares_of_residuals = sum((data-data_fit).^2);
    
    % definition of the coefficient of correlation is
    Rsquared = 1 - sum_of_squares_of_residuals/sum_of_squares;
    
    
end

1 Comment
Show -1 older commentsHide -1 older comments

Marcus on 26 Aug 2022

Thanks for your help on this. The data should have already been symmetric about 0 -- I'm not sure what I did when I made the array. Your approach definitely works (thanks!), although I'm still curious how to make it work with fitnlm if anyone else wants to post an answer addressing the problem that way.

Sign in to comment.

Answer 2

Alan Stevens on 25 Aug 2022

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1785650-improving-fit-of-data-to-sum-of-cosines-model#answer_1033165

Try changing the dihedral angles to radians (or change cos to cosd in expr).

3 Comments
Show 1 older commentHide 1 older comment

Mathieu NOE on 25 Aug 2022

Open in MATLAB Online

@Alan Stevens

good point !! that was quite evident ;

the code is even simpler now and there is one parameter less to optimize

still the periodicity of the data is not exactly 1 so wonder where that comes from

load('Workspace_dihedral_vs_energy_fitting.mat');
x = dihedral;
y = energy2;
%% important !!
x = x - 41; % needed to shift x data to center the y minimum at x = 0 and make the fitting work ! 
x = x*pi/180; % convert to radians
    
    %% curve fit using fminsearch
    
    [y_fit,sol] = myfit(x,y);    % NB : sol contains best estimates of ks1,ks2,ks3,ks4,ks5,f
    
    Rsquared = my_Rsquared_coeff(y,y_fit); % correlation coefficient
    figure(1)
    plot(x,y, 'b .-',x, y_fit, 'r-', 'MarkerSize', 15)
    title(['Model Fit  : R² = ' num2str(Rsquared)], 'FontSize', 15)
    xlabel('dihedral (rad)', 'FontSize', 14)
    ylabel('energy²', 'FontSize', 14)
   
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function     [y_fit,sol] = myfit(x,y)
    
 
%     Nonlinear regression model:
%     y = ks1 + (1/2)*(ks2*(1 + cos(dihedral)) + (1/2)*ks3*(1 - cos(2*dihedral)) + (1/2)*ks4*(1 + cos(3*dihedral)) + (1/2)*ks5*(1 - cos(4*dihedral)))
    func = @(ks1,ks2,ks3,ks4,ks5,x) ks1 + (1/2)*(ks2*(1+cos(x)) + (1/2)*ks3*(1-cos(2*x)) + (1/2)*ks4*(1+cos(3*x)) + (1/2)*ks5*(1-cos(4*x)));
    obj_fun = @(params) norm(func(params(1), params(2), params(3), params(4), params(5),x)-y);
    sol = fminsearch(obj_fun, [mean(y),0,0,0,0]); % to be updatd
    a_sol = sol(1);
    b_sol = sol(2);
    c_sol = sol(3);
    d_sol = sol(4);
    e_sol = sol(5);
    y_fit = func(a_sol, b_sol, c_sol, d_sol, e_sol,x);
    
end
function Rsquared = my_Rsquared_coeff(data,data_fit)
    % R2 correlation coefficient computation
    
    % The total sum of squares
    sum_of_squares = sum((data-mean(data)).^2);
    
    % The sum of squares of residuals, also called the residual sum of squares:
    sum_of_squares_of_residuals = sum((data-data_fit).^2);
    
    % definition of the coefficient of correlation is
    Rsquared = 1 - sum_of_squares_of_residuals/sum_of_squares;
    
    
end

Marcus on 27 Aug 2022

@Alan Stevens, would you know how to make this work with fitnlm?

Alan Stevens on 27 Aug 2022

No, I'm afraid I don't have the Staistics and machine learning toolbox.

Sign in to comment.

Improving fit of data to sum of cosines model

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Improving fit of data to sum of cosines model

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments

3 Comments
Show 1 older commentHide 1 older comment