Probit: removing groups that perfectly predict failures

7 views (last 30 days)
Hi all,
I have a group-year panel data as attahced. Apologies the data is very low quality.
There are 3 groups, each has 20 observations. Outcome y is a dummy variable for success. The first column in x is a continuous variable "effort". The second column is a dummy indicates group A. The third column is a dummy for group B. There is no dummy for group C to avoid collinearity.
I want to predict the probability of success using the probit model. The code I try is:
b = glmfit(x,y,'binomial','Link','probit');
b =
0.1857 (constant)
-1.8149 (effort)
-16.1148 (group A)
-16.2994 (group B)
As you can see in the data, all outcomes for group A are failures. So the second column in x predicts y == 0 perfectly. Matlab also raises a warning:
Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.
For the fitted linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:
XB<-0.834093: P=0
XB=-0.834093: P=1
XB>-0.834093: P=0
However, it still returns an estimated coefficient for group A dummy, which is b(3) = -16.1148.
Since x(:,2) perfectly predict failures, b(3) should be 0. Is there an option in glmfit to remove observations for group A within glmfit function, then return the coefficient as 0 for this column? So I can get something like:
b =
0.1857 (constant)
-1.8149 (effort)
0 (group A)
xxx (group B)
Stata does this automatically using the command:
probit y effort
It turns out the estiamtes for the constant and effort are the same. So the perfect failure issue only affects the group dummies coefficients...
Thank you!!!

Accepted Answer

Kumar Pallav
Kumar Pallav on 4 Aug 2021
From my understanding ,for the coefficient vector b, you expect the b(3)=0 as you mentioned that the second column of x (group A dummies) are failures(that is 0). But , after inspecting the data, I see that the second column of x are not all zeros.
%check if any non-zero value in the vector
containsNonZero = any(x(:,2)) %returns 1 if true
However, if you change the values of second column of x to zero
%change second column values of x to zero
b = glmfit(x,y,'binomial','Link','probit')
Then, the b(3) value becomes 0.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!