fitmnr
Syntax
Description
fits a multinomial regression model and returns a MnrMdl
= fitmnr(X
,Y
)MultinomialRegression
object MnrMdl
for the predictor data
in the matrix X
and the response data in Y
. The
input Y
can be a matrix of counts or a vector of response variable
classes.
uses the variables in MnrMdl
= fitmnr(Tbl
,ResponseVarName
)Tbl
as the predictor and response data. The
ResponseVarName
argument specifies which variable contains the
response data.
specifies the multinomial regression model in Wilkinson
Notation. The terms in MnrMdl
= fitmnr(Tbl
,Formula
)Formula
use only the variable names in
Tbl
.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, you can specify the link
function and the model type.MnrMdl
= fitmnr(___,Name=Value
)
Examples
Fit Nominal Multinomial Regression Model
Load the fisheriris
data set.
load fisheriris
The column vector species
contains iris flowers of three different species: setosa, versicolor, virginica. The double matrix meas
contains of four types of measurements for the flowers: the length and width of sepals and petals in centimeters. The data in species
is nominal, meaning that no natural order exists among the iris species.
Use the unique
function to display the iris species in order of their first appearance in the species
vector.
unique(species)
ans = 3x1 cell
{'setosa' }
{'versicolor'}
{'virginica' }
The output shows that virginica is the last species in the species
vector.
Fit a nominal multinomial regression model for the predictor data in meas
and the response data in species
. By default, the fitmnr
function uses virginica as the reference category because it appears last in species
.
MnrModel = fitmnr(meas,species)
MnrModel = Multinomial regression with nominal responses Value SE tStat pValue _______ ______ _______ __________ (Intercept_setosa) 2018.4 12.404 162.72 0 x1_setosa 673.85 3.5783 188.32 0 x2_setosa -568.2 3.176 -178.9 0 x3_setosa -516.44 3.5403 -145.87 0 x4_setosa -2760.9 7.1203 -387.75 0 (Intercept_versicolor) 42.638 5.2719 8.0878 6.0776e-16 x1_versicolor 2.4652 1.1228 2.1956 0.028124 x2_versicolor 6.6809 1.4789 4.5176 6.2559e-06 x3_versicolor -9.4294 1.2934 -7.2906 3.086e-13 x4_versicolor -18.286 2.0967 -8.7214 2.7476e-18 150 observations, 290 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 317.6851, p-value = 7.0555e-64
MnrModel
is a multinomial regression model object that contains the results of fitting a nominal multinomial regression model to the data. The output shows coefficient statistics for each predictor in meas
. The small p-values in the column pValue
indicate that all coefficients are statistically significant at the 95% confidence level.
Investigate the general form of the model equations by using the Formula
property of MnrModel
.
MnrModel.Formula
ans = logit(y) ~ 1 + x1 + x2 + x3 + x4
The output shows that the link function for MnrModel
is the default multinomial logit
link function. You can use the fitted model coefficients and the general form of the model equations to write these fitted model equations:
The equations describe the effects of the predictor variables on the quotient of the probability of a flower being in a category by the probability of a flower being in the reference category. For example, the estimated coefficient 2.5 in the second equation indicates that the probability of a flower being versicolor divided by the probability of a flower being virginica increases exp(2.5) times for each unit increase in . For more information about nominal multinomial regression model equations, see Multinomial Models for Nominal Responses.
Fit Multinomial Regression Model with Interactions
Load the fisheriris
data set.
load fisheriris
The column vector species
contains iris flowers of three different species: setosa, versicolor, virginica. The double matrix meas
contains of four types of measurements for the flowers: the length and width of sepals and petals in centimeters.
Create a table containing the flower measurements and the species data by using the table
function.
tbl = table(meas(:,1),meas(:,2),meas(:,3),meas(:,4),species,... VariableNames=["Sepal_Length","Sepal_Width","Petal_Length","Petal_Width","Species"])
tbl=150×5 table
Sepal_Length Sepal_Width Petal_Length Petal_Width Species
____________ ___________ ____________ ___________ __________
5.1 3.5 1.4 0.2 {'setosa'}
4.9 3 1.4 0.2 {'setosa'}
4.7 3.2 1.3 0.2 {'setosa'}
4.6 3.1 1.5 0.2 {'setosa'}
5 3.6 1.4 0.2 {'setosa'}
5.4 3.9 1.7 0.4 {'setosa'}
4.6 3.4 1.4 0.3 {'setosa'}
5 3.4 1.5 0.2 {'setosa'}
4.4 2.9 1.4 0.2 {'setosa'}
4.9 3.1 1.5 0.1 {'setosa'}
5.4 3.7 1.5 0.2 {'setosa'}
4.8 3.4 1.6 0.2 {'setosa'}
4.8 3 1.4 0.1 {'setosa'}
4.3 3 1.1 0.1 {'setosa'}
5.8 4 1.2 0.2 {'setosa'}
5.7 4.4 1.5 0.4 {'setosa'}
⋮
Fit a multinomial regression model to the flower data using the measurements as the predictor data and the species as the response data. By default, fitmnr
uses virginica as the reference category because it appears last the Species
column of tbl
. Specify the formula for the regression model to investigate whether the interaction between sepal width and sepal length is statistically significant.
MnrModel = fitmnr(tbl,"Species ~ Petal_Length*Sepal_Length + Petal_Width + Sepal_Width")
MnrModel = Multinomial regression with nominal responses Value SE tStat pValue _______ ______ ________ ___________ (Intercept_setosa) 232.64 50.281 4.6268 3.7137e-06 Sepal_Length_setosa 11.918 10.31 1.156 0.24768 Sepal_Width_setosa 27.965 4.5863 6.0975 1.0775e-09 Petal_Length_setosa -5.2318 13.66 -0.38302 0.70171 Petal_Width_setosa -260.77 9.3403 -27.919 1.5677e-171 Sepal_Length:Petal_Length_setosa -10.453 2.2141 -4.7211 2.3452e-06 (Intercept_versicolor) -231.52 43.626 -5.3068 1.1158e-07 Sepal_Length_versicolor 48.459 8.178 5.9255 3.1133e-09 Sepal_Width_versicolor 7.5712 1.7854 4.2406 2.2288e-05 Petal_Length_versicolor 47.602 9.5989 4.9591 7.0832e-07 Petal_Width_versicolor -20.603 2.7859 -7.3955 1.4089e-13 Sepal_Length:Petal_Length_versicolor -9.5086 1.7102 -5.5598 2.7005e-08 150 observations, 288 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 318.3928, p-value = 1.9971e-62
MnrModel
is a multinomial regression model object that contains the results of fitting a (default) nominal multinomial regression model to the data. The output shows coefficient statistics for each predictor in meas
. The small p-value for the interaction term in each log odds equation indicates that the interaction between sepal length and petal length has a statistically significant effect on the log quotient of the probability of a flower being in a category by the probability of a flower being in the reference category. For more information about nominal multinomial regression model equations, see Multinomial Models for Nominal Responses.
Fit Ordinal Multinomial Regression Model
Load the carbig
data set.
load carbig
The variables Acceleration
, Displacement
, Horsepower
, and Weight
contain data for car acceleration, engine displacement, horsepower, and weight, respectively. The variable MPG
contains car mileage data.
Sort the data in MPG
into four categories using the discretize
function. Create a table of predictor data using the table
function.
mileage = discretize(MPG,[9,19,29,39,48],"categorical"); X = table(Acceleration,Displacement,Horsepower,Weight,mileage,VariableNames=["Acceleration","Displacement","Horsepower","Weight","Mileage"])
X=406×5 table
Acceleration Displacement Horsepower Weight Mileage
____________ ____________ __________ ______ ___________
12 307 130 3504 [9, 19)
11.5 350 165 3693 [9, 19)
11 318 150 3436 [9, 19)
12 304 150 3433 [9, 19)
10.5 302 140 3449 [9, 19)
10 429 198 4341 [9, 19)
9 454 220 4354 [9, 19)
8.5 440 215 4312 [9, 19)
10 455 225 4425 [9, 19)
8.5 390 190 3850 [9, 19)
17.5 133 115 3090 <undefined>
11.5 350 165 4142 <undefined>
11 351 153 4034 <undefined>
10.5 383 175 4166 <undefined>
11 360 175 3850 <undefined>
10 383 170 3563 [9, 19)
⋮
The Mileage
column of the table X
contains entries that are undefined.
Fit an ordinal multinomial regression model using Acceleration
, Displacement
, Horsepower
, and Weight
as predictor variables and Mileage
as the response variable. fitmnr
ignores the rows of X
containing the undefined entries in the Mileage
column.
MnrModel = fitmnr(X,"Mileage",ModelType="ordinal")
MnrModel = Multinomial regression with ordinal responses Value SE tStat pValue _________ __________ _______ __________ (Intercept_[9, 19)) -16.69 1.9529 -8.5459 1.2757e-17 (Intercept_[19, 29)) -11.721 1.768 -6.6296 3.3667e-11 (Intercept_[29, 39)) -8.0606 1.7297 -4.6601 3.1603e-06 Acceleration 0.10476 0.079916 1.3109 0.18989 Displacement 0.010336 0.0049035 2.1078 0.035045 Horsepower 0.06452 0.01476 4.3712 1.2354e-05 Weight 0.0016638 0.00066089 2.5175 0.011821 392 observations, 1169 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 503.6344, p-value = 1.0964e-107
MnrModel
is a multinomial regression model object that contains the results of fitting an ordinal multinomial regression model to the data. The output shows coefficient statistics for each predictor in X
. By default, fitmnr
uses the logit
link function and calculates common coefficients among the ordinal response categories. This type of model is sometimes called a proportional odds model. At the 95% confidence level, the -values of 0.035045, 1.2354e-05, and 0.011821 indicate that each of the variables Displacement
, Horsepower
, and Weight
has a statistically significant effect on a log quotient of probabilities. The numerator is the probability of a car's mileage being less than or equal to a certain value, and the denominator is the probability of a car's mileage being greater than that value.
You can use the fitted model coefficients and the general form of the model equations to write these fitted model equations:
, , , , and represent the Mileage
, Acceleration
, Displacement
, Horsepower
, and Weight
variables in X
, respectively. For more information about the equations and the significance of terms in an ordinal multinomial regression model, see Multinomial Models for Ordinal Responses.
Input Arguments
X
— Predictor data
n-by-p numeric matrix
Predictor data, specified as an n-by-p numeric
matrix, where n is the number of observations in the data set and
p is the number of predictor variables. Each row of
X
corresponds to an observation, and each column corresponds to a
predictor variable.
Note
fitmnr
automatically includes a constant term (intercept) in
all models. Do not include a column of 1s in X
.
Data Types: single
| double
Y
— Response category labels
n-by-k matrix | n-by-1 numeric vector | n-by-1 logical vector | n-by-1 string vector | n-by-1 categorical array | n-by-1 cell array of character vectors
Response category labels, specified as an n-by-k matrix, or an n-by-1 numeric vector, logical vector, string vector, categorical array, or cell array of character vectors.
If
Y
is an n-by-k matrix, n is the number of observations in the data set, and k is the number of response categories. Y(i,j) is the number of outcomes of the multinomial category j for the predictor combinations given by X(i,:). In this case, the function determines the number of observations at each predictor combination.If
Y
is an n-by-1 numeric vector, logical vector, string vector, categorical array, or cell array of character vectors, the value of the vector or array indicates the response for each observation. In this case, all sample sizes are 1.
Data Types: single
| double
| logical
| char
| string
| cell
| categorical
Tbl
— Predictor and response data
table
Predictor and response data, specified as a table. The variables of
Tbl
can contain numeric or categorical data. When you specify
Tbl
, you must also specify the response data using the input
argument Y
, ResponseVarName
, or
Formula
.
If you specify the response data in
Y
, the table variables represent only the predictor data for the multinomial regression. A row of predictor values inTbl
corresponds to the observation inY
at the same position.Tbl
must have the same number of rows as the length ofY
.If you do not specify
Y
, you must indicate which variable inTbl
contains the response data by using theResponseVarName
orFormula
input argument. You can also choose a subset of factors inTbl
to use in the multinomial regression by setting thePredictorNames
name-value argument. Thefitmnr
function associates the values of the predictor variables inTbl
with the response data in the same row.
Example: snake=table(weight,length,species);
fitmnr(snake,"species")
Data Types: table
ResponseVarName
— Name of variable to use as response
string scalar | character vector
Name of the variable to use as the response, specified as a string scalar or
character vector. ResponseVarName
indicates which variable in
Tbl
contains the response data. When you specify
ResponseVarName
, you must also specify the
Tbl
input argument.
Example: snake=table(weight,length,species);
fitmnr(snake,"species")
Data Types: char
| string
Formula
— Multinomial regression model to fit
string scalar | character vector
Multinomial regression model to fit, specified as a string scalar or a character vector in Wilkinson notation.
Example: Specify the multinomial regression model as a sum of the variables
meas1
, meas2
, and meas3
and
their three-way interaction by using the formula "species ~ meas1 + meas2 +
meas3 + meas1:meas2:meas3"
.
Data Types: char
| string
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: fitmnr(X,Y,Link="probit",ModelType="ordinal",CategoricalPredictors=[1,2])
specifies "probit"
as the link function for an ordinal multinomial
regression model, and specifies the first two columns in X
as categorical
variables.
CategoricalPredictors
— Predictors to treat as categorical
numeric vector | string vector | cell array of character vectors | "all"
Predictors to treat as categorical, specified as a numeric or string vector, a
cell array of character vectors, or "all"
.
Specify CategoricalPredictors
as one of the following:
Numeric vector of indices between 1 and p, where p is the number of predictor variables. The index of a predictor variable is the order in which it appears in the columns of the matrix
X
or tableTbl
.Logical vector of length p, where a true entry indicates that the corresponding predictor is categorical.
String array or cell array of character vectors, where each element in the array is the name of a predictor variable. The predictor names must match the names in
PredictorNames
orTbl
."all"
, indicating that all predictor variables are categorical.
By default, fitmnr
treats numeric predictors as
continuous, and predictor variables of other types as categorical. For more
information about categorical data, see Natural Representation of Categorical Data.
Example: CategoricalPredictors=["Location"
"Smoker"]
Example: CategoricalPredictors=[1 3 4]
Data Types: single
| double
| logical
| string
| cell
EstimateDispersion
— Indicator for estimating dispersion parameter
false
or 0
(default) | true
or 1
Indicator for estimating a dispersion parameter, specified as a numeric or logical
0
(false
) or 1
(true
).
Value | Description |
---|---|
0 (false ) | Use the theoretical dispersion value of 1 (default). |
1 (true ) | Estimate a dispersion parameter for the multinomial distribution for use in computing standard errors. |
Example: EstimateDispersion=1
Data Types: single
| double
| logical
IncludeClassInteractions
— Indicator for an interaction between response categories and coefficients
true
or 1
| false
or 0
Indicator for an interaction between response categories and coefficients,
specified as a numeric or logical 1
(true
) or
0
(false
).
Value | Description |
---|---|
1 (true ) | Default for nominal and hierarchical models. Fit a model with different coefficients across classes. |
0 (false ) | Default for ordinal models. Fit a model with a common set of coefficients
for the predictor variables, across all multinomial classes. This type of
model is often called a parallel regression or
proportional odds model. For nominal models,
fitmnr returns an intercept-only model if you set
IncludeClassInteractions to
false . |
In all cases, the model has different intercepts across classes. The
IncludeClassInteractions
value determines the dimensions of the
output array MultinomialRegression.Coefficients
.
Example: IncludeClassInteractions=1
Data Types: single
| double
| logical
IterationLimit
— Maximum number of iterations
100 (default) | positive integer
Maximum number of iteratively reweighted least squares (IRLS) iterations used to
fit the model, specified as a positive integer. The default value for
IterationLimit
is 100
.
For more information about the IRLS algorithm, see Iteratively Reweighted Least Squares.
Example: IterationLimit=400
Data Types: single
| double
Link
— Link function
"logit"
(default) | "probit"
| "comploglog"
| "loglog"
Link function to use for ordinal and hierarchical models, specified as one of the values in the following table.
Value | Function |
---|---|
"logit" (default) | f(γ) = ln(γ/(1 –γ)) |
"probit" | f(γ) = Φ-1(γ) — error term is normally distributed with variance 1 |
"comploglog" | f(γ) = ln(–ln(1 – γ)) |
"loglog" | f(γ) = ln(–ln(γ)) |
The link function defines the relationship between response probabilities and the linear combination of predictors. γ is a cumulative probability when the model has an ordinal response, and a conditional probability when the model is has a hierarchical response.
For an ordinal model, πj is the probability of an observation being in category j, and γ represents the cumulative probability of an observation being in categories 1 to j. The model with a logit link function has the form
where k represents the last category.
For a hierarchical model, πj and γ represent the probability that an observation is in category j given that it is not in the previous categories. The model with a logit link function has the form
where k represents the last category.
You cannot specify Link
for nominal models because they
always use a multinomial logit link
where πj is the
probability of an observation being in category j, and
r corresponds to the reference category.
fitmnr
uses the last category as the reference category
for nominal models.
Example: Link="loglog"
Data Types: char
| string
ModelType
— Type of model to fit
"nominal"
(default) | "ordinal"
| "hierarchical"
Type of model to fit, specified as one of the values in the following table.
Value | Description |
---|---|
"nominal" (default) | The response classes are not ordered. Multinomial regression is sometimes
called softmax regression when
ModelType is "nominal" . |
"ordinal" | The response classes have a natural order. |
"hierarchical" | The response classes are nested. |
Example: ModelType="hierarchical"
Data Types: char
| string
PredictorNames
— Names of predictor variables
string vector (default) | cell array of character vectors
Predictor names, specified as a string vector or a cell array of character vectors.
If you specify
Tbl
in the call tofitmnr
, thenPredictorNames
must be a subset of the table variables inTbl
.fitmnr
uses only the predictors specified inPredictorNames
. In this case, the default value ofPredictorNames
is the collection of names of the predictor variables inTbl
.If you specify the matrix
X
in the call tofitmnr
, you can specify any names forPredictorNames
. In this case, the default value ofPredictorNames
is the 1-by-p cell array['x1','x2',…,'xp']
, where p is the number of predictor variables.
When you specify Formula
, fitmnr
ignores PredictorNames
.
Example: PredictorNames=["time","latitude"]
Data Types: string
| cell
ResponseName
— Name to assign to response variable
string scalar | character vector
Name to assign to the response variable, specified as a string scalar or character
vector. If you specify ResponseVarName
or
Formula
, fitmnr
ignores
ResponseName
.
Example: ResponseName="City"
Data Types: char
| string
Tolerance
— Termination tolerance
1e-6
(default) | numeric scalar
Termination tolerance for the Iteratively Reweighted Least Squares (IRLS) fitting
algorithm, specified as a numeric scalar. If the relative difference of each
coefficient is less than Tolerance
across two iterations of the
IRLS algorithm, then fitmnr
considers the model to be
converged. The default value for Tolerance
is
1e-6
.
For more information about the IRLS algorithm, see Iteratively Reweighted Least Squares.
Example: Tolerance=1e-4
Data Types: single
| double
Weights
— Observation weights
n-by-1 numeric vector
Observation weights, specified as an n-by-1 numeric vector,
where n is the number of observations. The default value for
Weights
is an n-by-1 vector of ones.
Example: Weights
Data Types: single
| double
Output Arguments
MnrMdl
— Multinomial regression model
MultinomialRegression
model object
Multinomial regression model, returned as a MultinomialRegression
model object.
Algorithms
If X
or Tbl
contains NaN
values, <undefined>
values, empty characters, or empty strings, the
fitmnr
function ignores the corresponding observations in
Y
.
Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)