The outcome of a response variable might be one of a restricted set of possible values. If there are only two possible outcomes, such as male and female for gender, these responses are called binary responses. If there are multiple outcomes, then they are called polytomous responses. Some examples of polytomous responses include levels of a disease (mild, medium, severe), preferred districts to live in a city, the species for a certain flower type, and so on. Sometimes there might be a natural order among the response categories. These responses are called ordinal responses.
The ordering might be inherent in the category choices, such as an individual being not satisfied, satisfied, or very satisfied with an online customer service. The ordering might also be introduced by categorization of a latent (continuous) variable, such as in the case of an individual being in the low risk, medium risk, or high risk group for developing a certain disease, based on a quantitative medical measure such as blood pressure.
You can specify a multinomial regression model that uses the natural ordering among the response categories. This ordinal model describes the relationship between the cumulative probabilities of the categories and predictor variables.
Different link functions can describe this relationship with logit and probit being the most used.
Logit: The default
link function mnrfit
uses for ordinal categories
is the logit link function. This models the log
cumulative odds. The 'link','logit'
name-value
pair specifies this in mnrfit
. Log cumulative odds
is the logarithm of the ratio of the probability that a response belongs
to a category with a value less than or equal to category j,
P(y ≤ cj),
and the probability that a response belongs to a category with a value
greater than category j, P(y >cj).
Ordinal models are usually based on the assumption that the
effects of predictor variables are the same for all categories on
the logarithmic scale. That is, the model has different intercepts
but common slopes (coefficients) among categories. This model is called parallel
regression or the proportional odds model.
It is the default for ordinal responses, and the 'interactions','off'
name-value
pair specifies this model in mnrfit
.
The proportional odds model is
where πj, j = 1, 2, ..., k, are the category probabilities.
For example, for a response variable with three categories, there are 3 – 1 = 2 equations as follows:
Under the proportional odds assumption, the partial effect of a predictor variable X is invariant to the choice of the response variable category, j. For example, if there are three categories, then the coefficients express the impact of a predictor variable on the relative risk or log odds of the response value being in category 1 versus categories 2 or 3, or in category 1 or 2 versus category 3.
Thus, a unit change in variable X2 would mean a change in the cumulative odds of the response value being in category 1 versus categories 2 or 3, or category 1 or 2 versus category 3 by a factor of exp(β2), given all else equal.
You can alternatively fit a model with different intercept and
slopes among the categories by using the 'interactions','on'
name-value
pair argument. However, using this option for ordinal models when
the equal slopes model is true causes a loss of efficiency (you lose
the advantage of estimating fewer parameters).
Probit: The 'link','probit'
name-value
pair argument uses the probit link function which
is based on a normally distributed latent variable assumption. For
ordinal response variables this is also called an ordered
probit model. Consider the regression model that describes
the relationship of a latent variable y* of an
ordinal process and a vector of predictor variables, X,
where the error term ε has a standard normal distribution. Suppose there is the following relationship between the latent variable y* and the observed variable y:
where α0 = – ∞ and αk = ∞. Then, the cumulative probability of y being in category j or one of earlier categories, P(y ≤ cj), is equal to
where Φ is standard normal cumulative distribution function. Thus,
where αj corresponds to the cut points of the latent variable and the intercept in the regression model. This only holds under the assumptions of a normal latent variable and parallel regression. More generally, for a response variable with k categories and multiple predictors, the ordered probit model is
where P(y ≤ cj) = π1 + π2 + ... + πj.
The coefficients indicate the impact of a unit change in the predictor variable on the likelihood of a state. A positive coefficient, β1, for example, indicates an increase in the underlying latent variable with an increase in the corresponding predictor variable, X1. Hence, it causes a decrease in P(y ≤ c1) and an increase in P(y ≤ ck).
After estimating the model coefficients using mnrfit
,
you can estimate the cumulative probabilities or the cumulative number
in each category using mnrval
with the 'type','cumulative'
name-value
pair option. mnrval
accepts the coefficient estimates
and the model statistics mnrfit
returns, and estimates
the categorical probabilities or the number in each category and their
confidence intervals. You can specify which category or conditional
probabilities or numbers to estimate by changing the value of the 'type'
name-value
pair argument.
[1] McCullagh, P., and J. A. Nelder. Generalized Linear Models. New York: Chapman & Hall, 1990.
[2] Long, J. S. Regression Models for Categorical and Limited Dependent Variables. Sage Publications, 1997.
[3] Dobson, A. J., and A. G. Barnett. An Introduction to Generalized Linear Models. Chapman and Hall/CRC. Taylor & Francis Group, 2008.
fitglm
| glmfit
| glmval
| mnrfit
| mnrval