MATLAB Answers

Different confidence intervals for regression slope

14 views (last 30 days)
Brian Scannell
Brian Scannell on 28 Apr 2017
Commented: Star Strider on 28 Apr 2017
Can anyone explain why I am getting different answers for the confidence limits for the slope of a linear regression when I use polyfit and polyparci compared with using fitlm and coefCI. For example the following code generates some linearly correlated data with added noise, then does the least squares fit directly, using polyfit and using fitlm, extracting the key items of data at each step:
clear variables
x = (0:10)';
Y = 3.5*x + (((rand(size(x))-0.5)/3).*x);
% option 1
X = [ones(size(Y)), x];
B1 = X\Y;
Ycalc = X*B1;
R21 = 1 - sum((Y - Ycalc).^2)/sum((Y - mean(Y)).^2);
R2a1 = 1 - ((1-R21)*(length(Y)-1)/(length(Y)-length(B1)));
clear X Ycalc
% option 2
[p,S] = polyfit(x,Y,1);
B2 = fliplr(p)';
coef = corrcoef(x,Y);
R22 = coef(1,2)^2;
R2a2 = 1 - ((1-R22)*(length(Y)-1)/(length(Y)-length(B2)));
ci2 = polyparci(p,S,0.95);
clear p S coef
% option 3
mdl = fitlm(x,Y,'y ~ x1');
B3 = mdl.Coefficients{:,1};
R23 = mdl.Rsquared.Ordinary;
R2a3 = mdl.Rsquared.Adjusted;
ci3 = coefCI(mdl,0.05);
ci3 = fliplr(ci3');
clear mdl
As one would expect, all of the approaches produce the same regression coefficients, R-squared and adjusted R-squared values. However, the confidence intervals generated by polyparci and coefCI are different. In all cases I have tried, the range of the confidence limits returned by coefCI is wider than that from polyparci.
Can anyone explain why the methods produce different results?
Thanks, Brian

  0 Comments

Sign in to comment.

Answers (2)

Brian Scannell
Brian Scannell on 28 Apr 2017
Ah, I think I've resolved it. There appears to be a difference in the way that the confidence interval alpha is interpreted. Calling polyparci(p,S,0.95) and coefCI(mdl,0.1) give the same answers.
I'm still not sure which set of limits are most appropriately described as the "95% confidence intervals" though - any views?

  0 Comments

Sign in to comment.


Star Strider
Star Strider on 28 Apr 2017
I originally tested polyparci only with nlparci, and the estimates then were essentially the same. I posted it before fitlm appeared.
Change the ‘tstat’ assignment in polyparci to:
tstat = @(tval) (max(alpha,(1-alpha)) - t_cdf(tval,PolyS.df) ); % Function to calculate t-statistic for p = ‘alpha’ and v = ‘PolyS.df’
and the results are identical with nlparci, fitlm and regress.
Thank you for discovering this glitch with the ‘alpha’ argument. I’ll update polyparci and post it.

  2 Comments

Brian Scannell
Brian Scannell on 28 Apr 2017
I am less confused by the alpha versus 1 - alpha issue than by the fact that to get matching results I have to specify 0.95 in polyparci and 0.1 (effectively 0.9) in coefCI.
I am interpreting the results from polyparci as being "there is a 95% probability that the "true" gradient is less than the calculated upper limit". Similarly, "there is a 95% probability that the "true" gradient is more than the lower limit". Taken together, it means there is a 10% chance that the "true" gradient is outside the bounds defined by the upper and lower limits.
So if the alpha input to coefCI is for the probability of the "true" gradient being outside the returned limits, then the factor two difference in the alpha value for the two functions makes sense.
But is this a correct interpretation of the outputs from the two functions?
Is this a distinction between "confidence limits" and "confidence interval"?
Thanks for your help.
Star Strider
Star Strider on 28 Apr 2017
My pleasure.
With the correction I posted, there is no ambiguity, and the confidence interval will be the same.
My impression is that the confidence interval calculation in nlparci changed between the time I wrote the function and now. I changed my function to accord with the current behavior of the MATLAB Statistics and Machine Learning Toolbox functions.
‘Taken together, it means there is a 10% chance that the "true" gradient is outside the bounds defined by the upper and lower limits.’
That is incorrect, at least as I read it. The confidence intervals are such that at a 95% (or 5%) confidence interval, there is a 95% probability that the true value is within those limits and a 5% (or ±2.5%) probability that they will lie outside those limits.
The terms ‘confidence limits’ and ‘confidence interval’ are essentially the same. The context must be clear if either term is used. I prefer the term ‘confidence limits’.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!