# compare

Class: LinearMixedModel

Compare linear mixed-effects models

## Description

results = compare(lme,altlme) returns the results of a likelihood ratio test that compares the linear mixed-effects models lme and altlme. Both models must use the same response vector in the fit and lme must be nested in altlme for a valid theoretical likelihood ratio test. Always input the smaller model first, and the larger model second.

compare tests the following null and alternate hypotheses:

H0: Observed response vector is generated by lme.

H1: Observed response vector is generated by model altlme.

It is recommended that you fit lme and altlme using the maximum likelihood (ML) method prior to model comparison. If you use the restricted maximum likelihood (REML) method, then both models must have the same fixed-effects design matrix.

To test for fixed effects, use compare with the simulated likelihood ratio test when lme and altlme are fit using ML or use the fixedEffects, anova, or coefTest methods.

example

results = compare(___,Name,Value) also returns the results of a likelihood ratio test that compares linear mixed-effects models lme and altlme with additional options specified by one or more Name,Value pair arguments.

For example, you can check if the first input model is nested in the second input model.

example

[results,siminfo] = compare(lme,altlme,'NSim',nsim) returns the results of a simulated likelihood ratio test that compares linear mixed-effects models lme and altlme.

You can fit lme and altlme using ML or REML. Also, lme does not have to be nested in altlme. If you use the restricted maximum likelihood (REML) method to fit the models, then both models must have the same fixed-effects design matrix.

example

[results,siminfo] = compare(___,Name,Value) also returns the results of a simulated likelihood ratio test that compares linear mixed-effects models lme and altlme with additional options specified by one or more Name,Value pair arguments.

For example, you can change the options for performing the simulated likelihood ratio test, or change the confidence level of the confidence interval for the p-value.

## Input Arguments

expand all

Linear mixed-effects model, specified as a LinearMixedModel object constructed using fitlme or fitlmematrix.

Alternative linear mixed-effects model fit to the same response vector but with different model specifications, specified as a LinearMixedModel object. lme must be nested in altlme, that is, lme should be obtained from altlme by setting some parameters to fixed values, such as 0. You can create a linear mixed-effects object using fitlme or fitlmematrix.

Number of replications for simulations in the simulated likelihood ratio test, specified as a positive integer number. You must specify nsim to do a simulated likelihood ratio test.

Example: 'NSim',1000

Data Types: double | single

### Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Significance level, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range 0 to 1. For a value α, the confidence level is 100*(1–α)%.

For example, for 99% confidence intervals, you can specify the confidence level as follows.

Example: 'Alpha',0.01

Data Types: single | double

Options for performing the simulated likelihood ratio test in parallel, specified as the comma-separated pair consisting of 'Options', and a structure created by statset('LinearMixedModel').

These options require Parallel Computing Toolbox™.

compare uses the following fields.

 'UseParallel' False for serial computation. Default.True for parallel computation. You need Parallel Computing Toolbox for parallel computation. 'UseSubstreams' False for not using a separate substream of the random number generator for each iteration. Default.True for using a separate substream of the random number generator for each iteration. You can only use this option with random stream types that support substreams. 'Streams' If 'UseSubstreams' is True, then 'Streams' must be a single random number stream, or a scalar cell array containing a single stream.If 'UseSubstreams' is False and 'UseParallel' is False, then 'Streams' must be a single random number stream, or a scalar cell array containing a single stream.'UseParallel' is True, then 'Streams' must be equal to the number of processors used. If a parallel pool is open, then the 'Streams' is the same length as the size of the parallel pool. If 'UseParallel' is True, a parallel pool might open up for you. But since 'Streams' must be equal to the number of processors used, it is best to open a pool explicitly using the parpool command, before calling compare with the'UseParallel','True' option.

For information on parallel statistical computing at the command line, enter

help parallelstats

Data Types: struct

Indicator to check nesting between two models, specified as the comma-separated pair consisting of 'CheckNesting' and one of the following.

 false Default. No checks. true compare checks if the smaller model lme is nested in the bigger model altlme.

lme must be nested in the alternate model altlme for a valid theoretical likelihood ratio test. compare returns an error message if the nesting requirements are not satisfied.

Although valid for both tests, the nesting requirements are weaker for the simulated likelihood ratio test.

Example: 'CheckNesting',true

Data Types: single | double

## Output Arguments

expand all

Results of the likelihood ratio test or simulated likelihood ratio test, returned as a dataset array with two rows. The first row is for lme, and the second row is for altlme. The columns of results depend on whether the test is a likelihood ratio or a simulated likelihood ratio test.

• If you use the likelihood ratio test, then results contains the following columns.

 Model Name of the model DF Degrees of freedom, that is, the number of free parameters in the model AIC Akaike information criterion for the model BIC Bayesian information criterion for the model LogLik Maximized log likelihood for the model LRStat Likelihood ratio test statistic for comparing altlme versus lme deltaDF DF for altlme minus DF for lme pValue p-value for the likelihood ratio test
• If you use the simulated likelihood ratio test, then results contains the following columns.

 Model Name of the model DF Degrees of freedom, that is, the number of free parameters in the model LogLik Maximized log likelihood for the model LRStat Likelihood ratio test statistic for comparing altlme versus lme pValue p-value for the likelihood ratio test Lower Lower limit of the confidence interval for pValue Upper Upper limit of the confidence interval for pValue

Simulation output, returned as a structure with the following fields.

 nsim Value set for nsim. alpha Value set for 'Alpha'. pValueSim Simulation-based p-value. pValueSimCI Confidence interval for pValueSim. The first element of the vector is the lower limit and the second element of the vector contains the upper limit. deltaDF The number of free parameters in altlme minus the number of free parameters in lme. DF for altlme minus DF for lme. THO A vector of simulated likelihood ratio test statistics under the null hypothesis that the model lme generated the observed response vector y.

## Examples

expand all

The flu dataset array has a Date variable, and 10 variables containing estimated influenza rates (in 9 different regions, estimated from Google® searches, plus a nationwide estimate from the CDC).

To fit a linear-mixed effects model, your data must be in a properly formatted dataset array. To fit a linear mixed-effects model with the influenza rates as the responses and region as the predictor variable, combine the nine columns corresponding to the regions into an array. The new dataset array, flu2, must have the response variable, FluRate, the nominal variable, Region, that shows which region each estimate is from, and the grouping variable Date.

flu2 = stack(flu,2:10,'NewDataVarName','FluRate',...
'IndVarName','Region');
flu2.Date = nominal(flu2.Date);

Fit a linear mixed-effects model, with a varying intercept and varying slope for each region, grouped by Date.

altlme = fitlme(flu2,'FluRate ~ 1 + Region + (1 + Region|Date)');

Fit a linear mixed-effects model with fixed effects for the region and a random intercept that varies by Date.

lme = fitlme(flu2,'FluRate ~ 1 + Region + (1|Date)');

Compare the two models. Also check if lme2 is nested in lme.

compare(lme,altlme,'CheckNesting',true)
ans =
Theoretical Likelihood Ratio Test

Model     DF    AIC        BIC        LogLik     LRStat    deltaDF    pValue
lme       11     318.71     364.35    -148.36
altlme    55    -305.51    -77.346     207.76    712.22    44         0

The small $p$-value of 0 indicates that model altlme is significantly better than the simpler model lme.

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a dataset array called ds, for practical purposes, and define Tomato, Soil, and Fertilizer as categorical variables.

ds = fertilizer;
ds.Tomato = nominal(ds.Tomato);
ds.Soil = nominal(ds.Soil);
ds.Fertilizer = nominal(ds.Fertilizer);

Fit a linear mixed-effects model, where Fertilizer and Tomato are the fixed-effects variables, and the mean yield varies by the block (soil type) and the plots within blocks (tomato types within soil types) independently.

lmeBig = fitlme(ds,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');

Refit the model after removing the interaction term Tomato:Fertilizer and the random-effects term (1 | Soil).

lmeSmall = fitlme(ds,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');

Compare the two models using the simulated likelihood ratio test with 1000 replications. You must use this test to test for both fixed- and random-effect terms. Note that both models are fit using the default fitting method, ML. That’s why, there is no restriction on the fixed-effects design matrices. If you use restricted maximum likelihood (REML) method, both models must have identical fixed-effects design matrices.

[table,siminfo] = compare(lmeSmall,lmeBig,'nsim',1000)
table =
Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05

Model       DF    AIC       BIC       LogLik     LRStat    pValue
lmeSmall    10    511.06       532    -245.53
lmeBig      23    522.57    570.74    -238.29    14.491    0.57343

Lower      Upper

0.54211    0.60431

siminfo = struct with fields:
nsim: 1000
alpha: 0.0500
pvalueSim: 0.5734
pvalueSimCI: [0.5421 0.6043]
TH0: [1000x1 double]

The high $p$-value suggests that the larger model, lme is not significantly better than the smaller model, lme2. The smaller values of Akaike and Bayesian Information Criteria for lme2 also support this.

Fit a linear mixed-effects model for miles per gallon (MPG), with fixed effects for acceleration, horsepower, and the cylinders, and potentially correlated random effects for intercept and acceleration grouped by model year.

First, prepare the design matrices.

X = [ones(406,1) Acceleration Horsepower];
Z = [ones(406,1) Acceleration];
Model_Year = nominal(Model_Year);
G = Model_Year;

Now, fit the model using fitlmematrix with the defined design matrices and grouping variables.

lme = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',....
{'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',...
{{'Intercept','Acceleration'}},'RandomEffectGroups',{'Model_Year'});

Refit the model with uncorrelated random effects for intercept and acceleration. First prepare the random effects design and the random effects grouping variables.

Z = {ones(406,1),Acceleration};
G = {Model_Year,Model_Year};

lme2 = fitlmematrix(X,MPG,Z,G,'FixedEffectPredictors',....
{'Intercept','Acceleration','Horsepower'},'RandomEffectPredictors',...
{{'Intercept'},{'Acceleration'}},'RandomEffectGroups',...
{'Model_Year','Model_Year'});

Compare lme and lme2 using the simulated likelihood ratio test.

compare(lme2,lme,'CheckNesting',true,'NSim',1000)
ans =

SIMULATED LIKELIHOOD RATIO TEST: NSIM = 1000, ALPHA = 0.05

Model    DF    AIC       BIC       LogLik     LRStat    pValue      Lower
lme2     6     2194.5    2218.3    -1091.3
lme      7     2193.5    2221.3    -1089.7    3.0323    0.095904    0.078373

Upper

0.11585

The high -value indicates that lme2 is not a significantly better fit than lme.

The dataset array includes data from a split-plot experiment, where soil is divided into three blocks based on the soil type: sandy, silty, and loamy. Each block is divided into five plots, where five different types of tomato plants (cherry, heirloom, grape, vine, and plum) are randomly assigned to these plots. The tomato plants in the plots are then divided into subplots, where each subplot is treated by one of four fertilizers. This is simulated data.

Store the data in a table called tbl, and define Tomato, Soil, and Fertilizer as categorical variables.

tbl = dataset2table(fertilizer);
tbl.Tomato = categorical(tbl.Tomato);
tbl.Soil = categorical(tbl.Soil);
tbl.Fertilizer = categorical(tbl.Fertilizer);

Fit a linear mixed-effects model, where Fertilizer and Tomato are the fixed-effects variables, and the mean yield varies by the block (soil type), and the plots within blocks (tomato types within soil types) independently.

lme = fitlme(tbl,'Yield ~ Fertilizer * Tomato + (1|Soil) + (1|Soil:Tomato)');

Refit the model after removing the interaction term Tomato:Fertilizer and the random-effects term (1|Soil).

lme2 = fitlme(tbl,'Yield ~ Fertilizer + Tomato + (1|Soil:Tomato)');

Create the options structure for LinearMixedModel.

opt = statset('LinearMixedModel')
opt = struct with fields:
Display: 'off'
MaxFunEvals: []
MaxIter: 10000
TolBnd: []
TolFun: 1.0000e-06
TolTypeFun: []
TolX: 1.0000e-12
TolTypeX: []
Jacobian: []
DerivStep: []
FunValCheck: []
Robust: []
RobustWgtFun: []
WgtFun: []
Tune: []
UseParallel: []
UseSubstreams: []
Streams: {}
OutputFcn: []

Change the options for parallel testing.

opt.UseParallel = true;

Start a parallel environment.

mypool = parpool();
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).

Compare lme2 and lme using the simulated likelihood ratio test with 1000 replications and parallel computing.

compare(lme2,lme,'nsim',1000,'Options',opt)
ans =
Simulated Likelihood Ratio Test: Nsim = 1000, Alpha = 0.05

Model    DF    AIC       BIC       LogLik     LRStat    pValue     Lower    Upper
lme2     10    511.06       532    -245.53
lme      23    522.57    570.74    -238.29    14.491    0.53447    0.503    0.56573

The high $p$-value suggests that the larger model, lme is not significantly better than the smaller model, lme2. The smaller values of AIC and BIC for lme2 also support this.