MATLAB Answers

Where is the coefficient for the reference condition when using 'fitlm' to perform ANOVA with intercept omitted?

6 views (last 30 days)
kndw on 26 Jun 2020
Commented: kndw on 27 Jun 2020
I want to compare the means for a variable among five different groups. To do so, I use fitlm to perform an ANOVA. The following code provides an example that closely resembles my own data:
% simulate data
x = repmat([1:5]',[10,1]);
y = (x-4).^2 + randn([size(x),1]);
grp = categorical(x);
% fit model
This yields an intercept, which represents the mean for grp 1, and coefficients for the other groups that represent the difference of the mean for those groups relative to the intercept.
I am unsure (and would also like to know) how to then calculate the standard error (SE) of the group means from the combination of the SE for the intercept and SE for the coefficients. I therefore tried omitting the intercept, expecting coefficients for each group
Here there is indeed no intercept, but also no coefficient for group 1. Where did it go?
Aside, as a workaround, I tried dummy coding the group variable
grp_dummy = dummyvar(grp);
This gives me the expected result, but the SE correspond to the SE of the intercept in the first model. How then do the SE of the coefficients of the first model fit in with this latter model?


Sign in to comment.

Accepted Answer

Jeff Miller
Jeff Miller on 27 Jun 2020
Why not just use this?
for iGrp=1:5
stderr = std(y(x==iGrp)) / sqrt( sum(x==iGrp) )


kndw on 27 Jun 2020
Thanks for the suggestion. I understand I could compute the SE like that, but still I'm curious where group 1 goes when the intercept is omitted when using fitlm.
Jeff Miller
Jeff Miller on 27 Jun 2020
If the intercept is omitted, the linear model being fit constrains the true mean of group 1 to be zero. So, any deviation of the group 1 observed mean from zero contributes to error. Look at the effect of adding this to the end of your code:
grp1 = grp==categorical(1);
y(grp1) = y(grp1) + 1000;
You get the same coefficient estimates as before, but the SE of those estimates goes way up. That's because there is a lot of error in those group 1 scores, relative to the predicted values of zero. Now try
y(grp1) = y(grp1) - mean(y(grp1));
The SEs are now much smaller--even smaller than they were originally--because 0 is actually a very accurate estimate of these revised group 1 scores.
So, I think the answer is that any deviation of the group 1 scores from 0 goes into error.

Sign in to comment.

More Answers (0)