Probit: removing groups that perfectly predict failures
7 views (last 30 days)
I have a group-year panel data as attahced. Apologies the data is very low quality.
There are 3 groups, each has 20 observations. Outcome y is a dummy variable for success. The first column in x is a continuous variable "effort". The second column is a dummy indicates group A. The third column is a dummy for group B. There is no dummy for group C to avoid collinearity.
I want to predict the probability of success using the probit model. The code I try is:
b = glmfit(x,y,'binomial','Link','probit');
-16.1148 (group A)
-16.2994 (group B)
As you can see in the data, all outcomes for group A are failures. So the second column in x predicts y == 0 perfectly. Matlab also raises a warning:
Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.
For the fitted linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:
However, it still returns an estimated coefficient for group A dummy, which is b(3) = -16.1148.
Since x(:,2) perfectly predict failures, b(3) should be 0. Is there an option in glmfit to remove observations for group A within glmfit function, then return the coefficient as 0 for this column? So I can get something like:
0 (group A)
xxx (group B)
Stata does this automatically using the command:
probit y effort i.group
It turns out the estiamtes for the constant and effort are the same. So the perfect failure issue only affects the group dummies coefficients...
Kumar Pallav on 4 Aug 2021
From my understanding ,for the coefficient vector b, you expect the b(3)=0 as you mentioned that the second column of x (group A dummies) are failures(that is 0). But , after inspecting the data, I see that the second column of x are not all zeros.
%check if any non-zero value in the vector
containsNonZero = any(x(:,2)) %returns 1 if true
However, if you change the values of second column of x to zero
%change second column values of x to zero
b = glmfit(x,y,'binomial','Link','probit')
Then, the b(3) value becomes 0.