## Generalized Linear Mixed-Effects Models

### What Are Generalized Linear Mixed-Effects Models?

Generalized linear mixed-effects (GLME) models describe the relationship between a response variable and independent variables using coefficients that can vary with respect to one or more grouping variables, for data with a response variable distribution other than normal. You can think of GLME models as extensions of generalized linear models (GLM) for data that are collected and summarized in groups. Alternatively, you can think of GLME models as a generalization of linear mixed-effects models (LME) for data where the response variable is not normally distributed.

A mixed-effects model consists of fixed-effects and random-effects terms. Fixed-effects terms are usually the conventional linear regression part of the model. Random-effects terms are associated with individual experimental units drawn at random from a population, and account for variations between groups that might affect the response. The random effects have prior distributions, whereas the fixed effects do not.

### GLME Model Equations

The standard form of a generalized linear mixed-effects model is

$$\begin{array}{c}{y}_{i}|b\end{array}\sim Distr\left({\mu}_{i},\frac{{\sigma}^{2}}{{w}_{i}}\right)$$

$$g\left(\mu \right)=X\beta +Zb+\delta \text{\hspace{0.17em}},$$

where

*y*is an*n*-by-1 response vector, and*y*is its_{i}*i*th element.*b*is the random-effects vector.*Distr*is a specified conditional distribution of*y*given*b*.*μ*is the conditional mean of*y*given*b*, and*μ*is its_{i}*i*th element.*σ*^{2}is the dispersion parameter.*w*is the effective observation weight vector, and*w*is the weight for observation_{i}*i*.For a binomial distribution, the effective observation weight is equal to the prior weight specified using the

`'Weights'`

name-value pair argument in`fitglme`

, multiplied by the binomial size specified using the`'BinomialSize'`

name-value pair argument.For all other distributions, the effective observation weight is equal to the prior weight specified using the

`'Weights'`

name-value pair argument in`fitglme`

.

*g*(*μ*) is a link function that defines the relationship between the mean response*μ*and the linear combination of the predictors.*X*is an*n*-by-*p*fixed-effects design matrix.*β*is a*p*-by-1 fixed-effects vector.*Z*is an*n*-by-*q*random-effects design matrix.*b*is a*q*-by-1 random-effects vector.*δ*is a model offset vector.

The model for the mean response *μ* is

$$\mu ={g}^{-1}\left(\eta \right)\text{\hspace{0.17em}},$$

where *g ^{-1}* is
inverse of the link function

*g*(

*μ*), and $${\widehat{\eta}}_{ME}$$ is the linear predictor of the fixed and random effects of the generalized linear mixed-effects model

$$\eta =X\beta +Zb+\delta \text{\hspace{0.17em}}.$$

A GLME model is parameterized by *β*, *θ*,
and *σ*^{2}.

The assumptions for generalized linear mixed-effects models are:

The random effects vector

*b*has the prior distribution:$$b|{\sigma}^{2},\theta \sim N\left(0,{\sigma}^{2}D\left(\theta \right)\right)\text{\hspace{0.17em}},$$

where

*σ*is the dispersion parameter, and^{2}*D*is a symmetric and positive semidefinite matrix parameterized by an unconstrained parameter vector*θ*.The observations

*y*are conditionally independent given_{i}*b*.

### Prepare Data for Model Fitting

To fit a GLME model to your data, use `fitglme`

.
Format your input data using the `table`

data type.
Each row of the table represents one observation, and each column
represents one predictor variable. For more information on creating
and using `table`

, see Create Tables and Assign Data to Them.

Input data can include continuous and grouping variables. `fitglme`

assumes
that predictors using the following data types are categorical:

Logical

Categorical

Character vector or character array

String array

Cell array of character vectors

If the input data table contains any `NaN`

values,
then `fitglme`

excludes that entire row of data
from the fit. To exclude additional rows of data, you can use the `'Exclude'`

name-value
pair argument of `fitglme`

when fitting the model.

### Choose a Distribution Type for the Model

GLME models are used when the response data does not follow
a normal distribution. Therefore, when fitting a model using `fitglme`

, you must specify the response
distribution type using the `'Distribution'`

name-value
pair argument. Often, the type of response data suggests the appropriate
distribution type for the model.

Type of Response Data | Suggested Response Distribution Type |
---|---|

Any real number | `'Normal'` |

Any positive number | `'Gamma'` or `'InverseGaussian'` |

Any nonnegative integer | `'Poisson'` |

Integer from 0 to n, where n is
a fixed positive value | `'Binomial'` |

### Choose a Link Function for the Model

GLME models use a link function, *g*, to map
the relationship between the mean response and the linear combination
of the predictors. By default, `fitglme`

uses
a predefined, commonly accepted link function based on the specified
distribution of the response data, as shown in the following table.
However, you can specify a different link function from the list of
predefined functions, or define your own, using the `'Link'`

name-value
pair argument of `fitglme`

.

Value | Description |
---|---|

`'comploglog'` | `g(mu) = log(-log(1-mu))` |

`'identity'` |
Canonical link for the normal distribution. |

`'log'` |
Canonical link for the Poisson distribution. |

`'logit'` |
Canonical link for the binomial distribution. |

`'loglog'` | `g(mu) = log(-log(mu))` |

`'probit'` | `g(mu) = norminv(mu)` |

`'reciprocal'` | `g(mu) = mu.^(-1)` |

Scalar value `P` | `g(mu) = mu.^P` |

Structure `S` | A structure containing four fields whose values are function handles: `S.Link` — Link function`S.Derivative` — Derivative`S.SecondDerivative` — Second derivative`S.Inverse` — Inverse of link
If |

When fitting a model to data, `fitglme`

uses
the canonical link function by default.

Distribution | Default Link Function |
---|---|

`'Normal'` | `'identity'` |

`'Binomial'` | `'logit'` |

`'Poisson'` | `'log'` |

`'Gamma'` | `-1` |

`'InverseGaussian'` | `-2` |

The link functions `'comploglog'`

, `'loglog'`

,
and `'probit'`

are mainly useful for binomial models.

### Specify the Model Formula

Model specification for `fitglme`

uses Wilkinson notation, which
is a character vector or string scalar of the form `'y ~ terms'`

,
where `y`

is the response variable name, and
`terms`

is written in the following notation.

Wilkinson Notation | Factors in Standard Notation |
---|---|

`1` | Constant (intercept) term |

`X^k` , where `k` is a positive
integer | `X` , `X` ,
..., `X` |

`X1 + X2` | `X1` , `X2` |

`X1*X2` | `X1` , `X2` , ```
X1.*X2
(element-wise multiplication of X1 and X2)
``` |

`X1:X2` | `X1.*X2` only |

`- X2` | Do not include `X2` |

`X1*X2 + X3` | `X1` , `X2` , `X3` , `X1*X2` |

`X1 + X2 + X3 + X1:X2` | `X1` , `X2` , `X3` , `X1*X2` |

`X1*X2*X3 - X1:X2:X3` | `X1` , `X2` , `X3` , `X1*X2` , `X1*X3` , `X2*X3` |

`X1*(X2 + X3)` | `X1` , `X2` , `X3` , `X1*X2` , `X1*X3` |

Formulas include a constant (intercept) term by default. To
exclude a constant term from the model, include `–1`

in
the formula.

For generalized linear mixed-effects models, the formula specification
is of the form `'y ~ fixed + (random1|grouping1) + ... + (randomR|groupingR)'`

,
where `fixed`

and `random`

contain
the fixed-effects and the random-effects terms, respectively.

Suppose the input data table contains the following:

A response variable,

`y`

Predictor variables,

`X1`

,`X2`

, ...,`XJ`

, where*J*is the total number of predictor variables (including continuous and grouping variables).Grouping variables,

`g1`

,`g2`

, ...,`gR`

, where*R*is the number of grouping variables.

The grouping variables in `XJ`

and `gR`

can be categorical,
logical, character arrays, string arrays, or cell arrays of character
vectors.

Then, in a formula of the form ```
'y ~ fixed + (random1|g1)
+ ... + (randomR|gR)'
```

, the term `fixed`

corresponds
to a specification of the fixed-effects design matrix `X`

, `random1`

is
a specification of the random-effects design matrix `Z1`

corresponding
to grouping variable `g1`

, and similarly `randomR`

is
a specification of the random-effects design matrix `ZR`

corresponding
to grouping variable `gR`

. You can express the `fixed`

and `random`

terms
using Wilkinson notation as follows.

Formula | Description |
---|---|

`'y ~ X1 + X2'` | Fixed effects for the intercept, `X1` , and `X2` .
This formula is equivalent to `'y ~ 1 + X1 + X2'` . |

`'y ~ -1 + X1 + X2'` | No intercept, with fixed effects for `X1` and `X2` .
The implicit intercept term is suppressed by including `-1` . |

`'y ~ 1 + (1 | g1)'` | A fixed effect for the intercept, plus a random effect for
the intercept for each level of the grouping variable `g1` . |

`'y ~ X1 + (1 | g1)'` | Random intercept model with a fixed slope. |

`'y ~ X1 + (X1 | g1)'` | Random intercept and slope, with possible correlation between
them. This formula is equivalent to `'y ~ 1 + X1 + (1 + X1|g1)'` . |

`'y ~ X1 + (1 | g1) + (-1 + X1 | g1)' ` | Independent random-effects terms for intercept and slope. |

`'y ~ 1 + (1 | g1) + (1 | g2) + (1 | g1:g2)'` | Random intercept model with independent main effects for `g1` and `g2` ,
plus an independent interaction effect. |

For example, the sample data `mfr`

contains
simulated data from a manufacturing company that operates 50 factories
across the world. Each factory runs a batch process to create a finished
product. The company wants to decrease the number of defects in each
batch, so it developed a new manufacturing process. To test the effectiveness
of the new process, the company selected 20 of its factories at random
to participate in an experiment: Ten factories implemented the new
process, while the other ten continued to run the old process. In
each of the 20 factories, the company ran five batches (for a total
of 100 batches), and recorded data on processing time (`time_dev`

),
temperature (`temp_dev`

), number of defects (`defects`

),
and a categorical variable indicating the raw materials supplier (`supplier`

)
for each batch.

To determine whether the new process (represented by the predictor
variable `newprocess`

) significantly reduces the
number of defects, fit a GLME model using `newprocess`

, `time_dev`

, `temp_dev`

,
and `supplier`

as fixed-effects predictors. Include
a random-effects intercept grouped by `factory`

,
to account for quality differences that might exist due to factory-specific
variations. The response variable `defects`

has a
Poisson distribution.

The number of defects can be modeled using a Poisson distribution

$$defect{s}_{ij}~Poisson\left({\mu}_{ij}\right)$$

This corresponds to the generalized linear mixed-effects model

$$\begin{array}{l}\mathrm{log}\left({\mu}_{ij}\right)={\beta}_{0}+{\beta}_{1}newproces{s}_{ij}+{\beta}_{2}time\_de{v}_{ij}\text{\hspace{0.17em}}\\ \text{\hspace{0.17em}}\text{\hspace{1em}}\text{\hspace{1em}}\text{\hspace{1em}}+{\beta}_{3}temp\_de{v}_{ij}+{\beta}_{4}supplier\_{C}_{ij}+{\beta}_{5}supplier\_{B}_{ij}+{b}_{i}\text{\hspace{0.17em}},\end{array}$$

where

*defects*is the number of defects observed in the batch produced by factory_{ij}*i*(where*i*= 1, 2, ..., 20) during batch*j*(where*j*= 1, 2, ..., 5).*μ*is the mean number of defects corresponding to factory_{ij}*i*during batch*j*.*supplier_C*and_{ij}*supplier_B*are dummy variables that indicate whether company_{ij}`C`

or`B`

, respectively, supplied the process chemicals for the batch produced by factory*i*during batch*j*.*b*~ N(0,_{i}*σ*_{b}^{2}) is a random-effects intercept for each factory*i*that accounts for factory-specific variation in quality.

Using Wilkinson notation, specify this model as:

```
'defects ~ 1 + newprocess + time_dev + temp_dev + supplier
+ (1|factory)'
```

To account for the Poisson distribution of the response variable,
when fitting the model using `fitglme`

, specify
the `'Distribution'`

name-value pair argument as `'Poisson'`

.
By default, `fitglme`

uses a log link function
for response variables with a Poisson distribution.

### Display the Model

The output of the fitting function `fitglme`

provides
information about generalized linear mixed-effects model.

Using the `mfr`

manufacturing experiment data,
fit a model using `newprocess`

, `time_dev`

, `temp_dev`

,
and `supplier`

as fixed-effects predictors. Specify
the response distribution as Poisson, the link function as log, and
the fit method as Laplace.

load mfr glme = fitglme(mfr,... 'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)',... 'Distribution','Poisson','Link','log','FitMethod','Laplace',... 'DummyVarCoding','effects')

glme = Generalized linear mixed-effects model fit by ML Model information: Number of observations 100 Fixed effects coefficients 6 Random effects coefficients 20 Covariance parameters 1 Distribution Poisson Link Log FitMethod Laplace Formula: defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1 | factory) Model fit statistics: AIC BIC LogLikelihood Deviance 416.35 434.58 -201.17 402.35 Fixed effects coefficients (95% CIs): Name Estimate SE tStat DF pValue '(Intercept)' 1.4689 0.15988 9.1875 94 9.8194e-15 'newprocess' -0.36766 0.17755 -2.0708 94 0.041122 'time_dev' -0.094521 0.82849 -0.11409 94 0.90941 'temp_dev' -0.28317 0.9617 -0.29444 94 0.76907 'supplier_C' -0.071868 0.078024 -0.9211 94 0.35936 'supplier_B' 0.071072 0.07739 0.91836 94 0.36078 Lower Upper 1.1515 1.7864 -0.72019 -0.015134 -1.7395 1.5505 -2.1926 1.6263 -0.22679 0.083051 -0.082588 0.22473 Random effects covariance parameters: Group: factory (20 Levels) Name1 Name2 Type Estimate '(Intercept)' '(Intercept)' 'std' 0.31381 Group: Error Name Estimate 'sqrt(Dispersion)' 1

The `Model information`

table displays the
total number of observations in the sample data (100), the number
of fixed- and random-effects coefficients (6 and 20, respectively),
and the number of covariance parameters (1). It also indicates that
the response variable has a `Poisson`

distribution,
the link function is `Log`

, and the fit method is `Laplace`

.

`Formula`

indicates the model specification
using Wilkinson’s notation.

The `Model fit statistics`

table displays statistics
used to assess the goodness of fit of the model. This includes the
Akaike information criterion (`AIC`

), Bayesian information
criterion (`BIC`

) values, log likelihood (`LogLikelihood`

),
and deviance (`Deviance`

) values.

The `Fixed effects coefficients`

table indicates
that `fitglme`

returned 95% confidence intervals.
It contains one row for each fixed-effects predictor, and each column
contains statistics corresponding to that predictor. Column 1 (`Name`

)
contains the name of each fixed-effects coefficient, column 2 (`Estimate`

)
contains its estimated value, and column 3 (`SE`

)
contains the standard error of the coefficient. Column 4 (`tStat`

)
contains the *t*-statistic for a hypothesis test
that the coefficient is equal to 0. Column 5 (`DF`

)
and column 6 (`pValue`

) contain the degrees of freedom
and *p*-value that correspond to the *t*-statistic,
respectively. The last two columns (`Lower`

and `Upper`

)
display the lower and upper limits, respectively, of the 95% confidence
interval for each fixed-effects coefficient.

`Random effects covariance parameters`

displays
a table for each grouping variable (here, only `factory`

),
including its total number of levels (20), and the type and estimate
of the covariance parameter. Here, `std`

indicates
that `fitglme`

returns the standard deviation of
the random effect associated with the factory predictor, which has
an estimated value of 0.31381. It also displays a table containing
the error parameter type (here, the square root of the dispersion
parameter), and its estimated value of 1.

The standard display generated by `fitglme`

does
not provide confidence intervals for the random-effects parameters.
To compute and display these values, use `covarianceParameters`

.

### Work with the Model

After you create a GLME model using `fitglme`

,
you can use additional functions to work with the model.

#### Inspect and Test Coefficients and Confidence Intervals

To extract estimates of the fixed- and random-effects coefficients, covariance parameters, design matrices, and related statistics:

`fixedEffects`

extracts estimated fixed-effects coefficients and related statistics from a fitted model. Related statistics include the standard error; the*t*-statistic, degrees of freedom, and*p*-value for a hypothesis test of whether each parameter is equal to 0; and the confidence intervals.`randomEffects`

extracts estimated random-effects coefficients and related statistics from a fitted GLME model. Related statistics include the estimated empirical Bayes predictor (EBP) of each random effect, the square root of the conditional mean squared error of prediction (CMSEP) given the covariance parameters and the response; the*t*-statistic, estimated degrees of freedom, and*p*-value for a hypothesis test of whether each random effect is equal to 0; and the confidence intervals.`covarianceParameters`

extracts estimated covariance parameters and related statistics from a fitted GLME model. Related statistics include estimate of the covariance parameter, and the confidence intervals.`designMatrix`

extracts the fixed- and random-effects design matrices, or a specified subset thereof, from the fitted GLME model.

To conduct customized hypothesis tests for the significance of fixed- and random-effects coefficients, and to compute custom confidence intervals:

`anova`

performs a marginal*F*-test (hypothesis test) on fixed-effects terms, to determine if all coefficients representing the fixed-effects terms are equal to 0. You can use`anova`

to test the combined significance of the coefficients of categorical predictors.`coefCI`

computes confidence intervals for fixed- and random-effects parameters from a fitted GLME model. By default,`fitglme`

computes 95% confidence intervals. Use`coefCI`

to compute the boundaries at a different confidence level.`coefTest`

performs custom hypothesis tests on fixed-effects or random-effects vectors of a fitted generalized linear mixed-effects model. For example, you can specify contrast matrices.

#### Generate New Response Values and Refit Model

To generate new response values, including fitted, predicted, and random responses, based on the fitted GLME model:

`fitted`

computes fitted response values using the original predictor values, and the estimated coefficient and parameter values from the fitted model.`predict`

computes the predicted conditional or marginal mean of the response using either the original predictor values or new predictor values, and the estimated coefficient and parameter values from the fitted model.`random`

generates random responses from a fitted model.`refit`

creates a new fitted GLME model, based on the original model and a new response vector.

#### Inspect and Visualize Residuals

To extract and visualize residuals from the fitted GLME model:

`residuals`

extracts the raw or Pearson residuals from the fitted model. You can also specify whether to compute the conditional or marginal residuals.`plotResiduals`

creates plots using the raw or Pearson residuals from the fitted model, including:A histogram of the residuals

A scatterplot of the residuals versus fitted values

A scatterplot of residuals versus lagged residuals

## See Also

`fitglme`

| `GeneralizedLinearMixedModel`