# predict

**Class: **GeneralizedLinearMixedModel

Predict response of generalized linear mixed-effects model

## Syntax

## Description

returns
the predicted conditional means using the new predictor values specified
in `ypred`

= predict(`glme`

,`tblnew`

)`tblnew`

.

If a grouping variable in `tblnew`

has levels
that are not in the original data, then the random effects for that
grouping variable do not contribute to the `'Conditional'`

prediction
at observations where the grouping variable has new levels.

returns
the predicted conditional means of the response using additional options
specified by one or more `ypred`

= predict(___,`Name,Value`

)`Name,Value`

pair arguments.
For example, you can specify the confidence level, simultaneous confidence
bounds, or contributions from only fixed effects. You can use any
of the input arguments in the previous syntaxes.

## Input Arguments

`glme`

— Generalized linear mixed-effects model

`GeneralizedLinearMixedModel`

object

Generalized linear mixed-effects model, specified as a `GeneralizedLinearMixedModel`

object.
For properties and methods of this object, see `GeneralizedLinearMixedModel`

.

`tblnew`

— New input data

table | `dataset`

array

New input data, which includes the response variable, predictor
variables, and grouping
variables, specified as a table or dataset array. The predictor
variables can be continuous or grouping variables. `tblnew`

must
have the same variables as the original table or dataset array used
in `fitglme`

to fit the generalized linear mixed-effects
model `glme`

.

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

`Alpha`

— Significance level

0.05 (default) | scalar value in the range [0,1]

Significance level, specified as the comma-separated pair consisting of
`'Alpha'`

and a scalar value in the range [0,1]. For a value α, the
confidence level is 100 × (1 – α)%.

For example, for 99% confidence intervals, you can specify the confidence level as follows.

**Example: **`'Alpha',0.01`

**Data Types: **`single`

| `double`

`Conditional`

— Indicator for conditional predictions

`true`

(default) | `false`

Indicator for conditional predictions, specified as the comma-separated
pair consisting of `'Conditional'`

and one of the
following.

Value | Description |
---|---|

`true` | Contributions from both fixed effects and random effects (conditional) |

`false` | Contribution from only fixed effects (marginal) |

**Example: **`'Conditional',false`

`DFMethod`

— Method for computing approximate degrees of freedom

`'residual'`

(default) | `'none'`

Method for computing approximate degrees of freedom, specified
as the comma-separated pair consisting of `'DFMethod'`

and
one of the following.

Value | Description |
---|---|

`'residual'` | The degrees of freedom value is assumed to be constant and equal to n –
p, where n is the number of
observations and p is the number of fixed
effects. |

`'none'` | The degrees of freedom is set to infinity. |

**Example: **`'DFMethod','none'`

`Offset`

— Model offset

`zeros(m,1)`

(default) | *m*-by-1 vector of scalar values

Model offset, specified as a vector of scalar values of length *m*,
where *m* is the number of rows in `tblnew`

.
The offset is used as an additional predictor and has a coefficient
value fixed at `1`

.

`Simultaneous`

— Type of confidence bounds

`false`

(default) | `true`

Type of confidence bounds, specified as the comma-separated
pair consisting of `'Simultaneous'`

and either `false`

or `true`

.

If

`'Simultaneous'`

is`false`

, then`predict`

computes nonsimultaneous confidence bounds.If

`'Simultaneous'`

is`true`

,`predict`

returns simultaneous confidence bounds.

**Example: **`'Simultaneous',true`

## Output Arguments

`ypred`

— Predicted responses

vector

Predicted responses, returned as a vector. If the `'Conditional'`

name-value
pair argument is specified as `true`

, `ypred`

contains
predictions for the conditional means of the responses given the random
effects. Conditional predictions include contributions from both fixed
and random effects. Marginal predictions include only contributions
from fixed effects.

To compute marginal predictions, `predict`

computes
conditional predictions, but substitutes a vector of zeros in place
of the empirical Bayes predictors (EBPs) of the random effects.

`ypredCI`

— Point-wise confidence intervals

two-column matrix

Point-wise confidence intervals for the predicted values, returned
as a two-column matrix. The first column of `ypredCI`

contains
the lower bound, and the second column contains the upper bound. By
default, `ypredCI`

contains the 95% nonsimultaneous
confidence intervals for the predictions. You can change the confidence
level using the `Alpha`

name-value pair argument,
and make them simultaneous using the `Simultaneous`

name-value
pair argument.

When fitting a GLME model using `fitglme`

and
one of the maximum likelihood fit methods (`'Laplace'`

or `'ApproximateLaplace'`

), `predict`

computes
the confidence intervals using the conditional mean squared error
of prediction (CMSEP) approach conditional on the estimated covariance
parameters and the observed response. Alternatively, you can interpret
the confidence intervals as approximate Bayesian credible intervals
conditional on the estimated covariance parameters and the observed
response.

When fitting a GLME model using `fitglme`

and
one of the pseudo likelihood fit methods (`'MPL'`

or `'REMPL'`

), `predict`

bases
the computations on the fitted linear mixed-effects model from the
final pseudo likelihood iteration.

`DF`

— Degrees of freedom

vector | scalar value

Degrees of freedom used in computing the confidence intervals, returned as a vector or a scalar value.

If

`'Simultaneous'`

is`false`

, then`DF`

is a vector.If

`'Simultaneous'`

is`true`

, then`DF`

is a scalar value.

## Examples

### Predict Responses at Original Design Values

Load the sample data.

`load mfr`

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

Flag to indicate whether the batch used the new process (

`newprocess`

)Processing time for each batch, in hours (

`time`

)Temperature of the batch, in degrees Celsius (

`temp`

)Categorical variable indicating the supplier (

`A`

,`B`

, or`C`

) of the chemical used in the batch (`supplier`

)Number of defects in the batch (

`defects`

)

The data also includes `time_dev`

and `temp_dev`

, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using `newprocess`

, `time_dev`

, `temp_dev`

, and `supplier`

as fixed-effects predictors. Include a random-effects term for intercept grouped by `factory`

, to account for quality differences that might exist due to factory-specific variations. The response variable `defects`

has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as `'effects'`

, so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution:

$${\text{defects}}_{ij}\sim \text{Poisson}({\mu}_{ij})$$

This corresponds to the generalized linear mixed-effects model

$$\mathrm{log}({\mu}_{ij})={\beta}_{0}+{\beta}_{1}{\text{newprocess}}_{ij}+{\beta}_{2}{\text{time}\text{\_}\text{dev}}_{ij}+{\beta}_{3}{\text{temp}\text{\_}\text{dev}}_{ij}+{\beta}_{4}{\text{supplier}\text{\_}\text{C}}_{ij}+{\beta}_{5}{\text{supplier}\text{\_}\text{B}}_{ij}+{b}_{i},$$

where

$${\text{defects}}_{ij}$$ is the number of defects observed in the batch produced by factory $$i$$ during batch $$j$$.

$${\mu}_{ij}$$ is the mean number of defects corresponding to factory $$i$$ (where $$i=1,2,...,20$$) during batch $$j$$ (where $$j=1,2,...,5$$).

$${\text{newprocess}}_{ij}$$, $${\text{time}\text{\_}\text{dev}}_{ij}$$, and $${\text{temp}\text{\_}\text{dev}}_{ij}$$ are the measurements for each variable that correspond to factory $$i$$ during batch $$j$$. For example, $${\text{newprocess}}_{ij}$$ indicates whether the batch produced by factory $$i$$ during batch $$j$$ used the new process.

$${\text{supplier}\text{\_}\text{C}}_{ij}$$ and $${\text{supplier}\text{\_}\text{B}}_{ij}$$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company

`C`

or`B`

, respectively, supplied the process chemicals for the batch produced by factory $$i$$ during batch $$j$$.$${b}_{i}\sim N(0,{\sigma}_{b}^{2})$$ is a random-effects intercept for each factory $$i$$ that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Predict the response values at the original design values. Display the first ten predictions along with the observed response values.

ypred = predict(glme); [ypred(1:10),mfr.defects(1:10)]

`ans = `*10×2*
4.9883 6.0000
5.9423 7.0000
5.1318 6.0000
5.6295 5.0000
5.3499 6.0000
5.2134 5.0000
4.6430 4.0000
4.5342 4.0000
5.3903 9.0000
4.6529 4.0000

Column 1 contains the predicted response values at the original design values. Column 2 contains the observed response values.

### Predict Responses at Values in New Table

Load the sample data.

`load mfr`

This simulated data is from a manufacturing company that operates 50 factories across the world, with each factory running a batch process to create a finished product. The company wants to decrease the number of defects in each batch, so it developed a new manufacturing process. To test the effectiveness of the new process, the company selected 20 of its factories at random to participate in an experiment: Ten factories implemented the new process, while the other ten continued to run the old process. In each of the 20 factories, the company ran five batches (for a total of 100 batches) and recorded the following data:

Flag to indicate whether the batch used the new process (

`newprocess`

)Processing time for each batch, in hours (

`time`

)Temperature of the batch, in degrees Celsius (

`temp`

)Categorical variable indicating the supplier (

`A`

,`B`

, or`C`

) of the chemical used in the batch (`supplier`

)Number of defects in the batch (

`defects`

)

The data also includes `time_dev`

and `temp_dev`

, which represent the absolute deviation of time and temperature, respectively, from the process standard of 3 hours at 20 degrees Celsius.

Fit a generalized linear mixed-effects model using `newprocess`

, `time_dev`

, `temp_dev`

, and `supplier`

as fixed-effects predictors. Include a random-effects term for intercept grouped by `factory`

, to account for quality differences that might exist due to factory-specific variations. The response variable `defects`

has a Poisson distribution, and the appropriate link function for this model is log. Use the Laplace fit method to estimate the coefficients. Specify the dummy variable encoding as `'effects'`

, so the dummy variable coefficients sum to 0.

The number of defects can be modeled using a Poisson distribution:

$${\text{defects}}_{ij}\sim \text{Poisson}({\mu}_{ij})$$

This corresponds to the generalized linear mixed-effects model

$$\mathrm{log}({\mu}_{ij})={\beta}_{0}+{\beta}_{1}{\text{newprocess}}_{ij}+{\beta}_{2}{\text{time}\text{\_}\text{dev}}_{ij}+{\beta}_{3}{\text{temp}\text{\_}\text{dev}}_{ij}+{\beta}_{4}{\text{supplier}\text{\_}\text{C}}_{ij}+{\beta}_{5}{\text{supplier}\text{\_}\text{B}}_{ij}+{b}_{i},$$

where

$${\text{defects}}_{ij}$$ is the number of defects observed in the batch produced by factory $$i$$ during batch $$j$$.

$${\mu}_{ij}$$ is the mean number of defects corresponding to factory $$i$$ (where $$i=1,2,...,20$$) during batch $$j$$ (where $$j=1,2,...,5$$).

$${\text{newprocess}}_{ij}$$, $${\text{time}\text{\_}\text{dev}}_{ij}$$, and $${\text{temp}\text{\_}\text{dev}}_{ij}$$ are the measurements for each variable that correspond to factory $$i$$ during batch $$j$$. For example, $${\text{newprocess}}_{ij}$$ indicates whether the batch produced by factory $$i$$ during batch $$j$$ used the new process.

$${\text{supplier}\text{\_}\text{C}}_{ij}$$ and $${\text{supplier}\text{\_}\text{B}}_{ij}$$ are dummy variables that use effects (sum-to-zero) coding to indicate whether company

`C`

or`B`

, respectively, supplied the process chemicals for the batch produced by factory $$i$$ during batch $$j$$.$${b}_{i}\sim N(0,{\sigma}_{b}^{2})$$ is a random-effects intercept for each factory $$i$$ that accounts for factory-specific variation in quality.

glme = fitglme(mfr,'defects ~ 1 + newprocess + time_dev + temp_dev + supplier + (1|factory)','Distribution','Poisson','Link','log','FitMethod','Laplace','DummyVarCoding','effects');

Predict the response values at the original design values.

ypred = predict(glme);

Create a new table by copying the first 10 rows of `mfr`

into `tblnew`

.

tblnew = mfr(1:10,:);

The first 10 rows of `mfr`

include data collected from trials 1 through 5 for factories 1 and 2. Both factories used the old process for all of their trials during the experiment, so `newprocess = 0`

for all 10 observations.

Change the value of `newprocess`

to `1`

for the observations in `tblnew`

.

tblnew.newprocess = ones(height(tblnew),1);

Compute predicted response values and nonsimultaneous 99% confidence intervals using `tblnew`

. Display the first 10 rows of the predicted values based on `tblnew`

, the predicted values based on `mfr`

, and the observed response values.

```
[ypred_new,ypredCI] = predict(glme,tblnew,'Alpha',0.01);
[ypred_new,ypred(1:10),mfr.defects(1:10)]
```

`ans = `*10×3*
3.4536 4.9883 6.0000
4.1142 5.9423 7.0000
3.5530 5.1318 6.0000
3.8976 5.6295 5.0000
3.7040 5.3499 6.0000
3.6095 5.2134 5.0000
3.2146 4.6430 4.0000
3.1393 4.5342 4.0000
3.7320 5.3903 9.0000
3.2214 4.6529 4.0000

Column 1 contains predicted response values based on the data in `tblnew`

, where `newprocess = 1`

. Column 2 contains predicted response values based on the original data in `mfr`

, where `newprocess = 0`

. Column 3 contains the observed response values in `mfr`

. Based on these results, if all other predictors retain their original values, the predicted number of defects appears to be smaller when using the new process.

Display the 99% confidence intervals for rows 1 through 10 corresponding to the new predicted response values.

ypredCI(1:10,1:2)

`ans = `*10×2*
1.6983 7.0235
1.9191 8.8201
1.8735 6.7380
2.0149 7.5395
1.9034 7.2079
1.8918 6.8871
1.6776 6.1597
1.5404 6.3976
1.9574 7.1154
1.6892 6.1436

## References

[1] Booth, J.G., and J.P. Hobert. “Standard Errors
of Prediction in Generalized Linear Mixed Models.” *Journal
of the American Statistical Association*, Vol. 93, 1998,
pp. 262–272.

## See Also

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)