N-Way ANOVA

Introduction to N-Way ANOVA

You can use the function anovan to perform N-way ANOVA. Use N-way ANOVA to determine if the means in a set of data differ with respect to groups (levels) of multiple factors. By default, anovan treats all grouping variables as fixed effects. For an example of ANOVA with random effects, see ANOVA with Random Effects. For repeated measures, see fitrm and ranova.

N-way ANOVA is a generalization of two-way ANOVA. For three factors, for example, the model can be written as

$y_{i j k r} = μ + α_{i} + β_{j} + γ_{k} + {(α β)}_{i j} + {(α γ)}_{i k} + {(β γ)}_{j k} + {(α β γ)}_{i j k} + ε_{i j k r,}$

where

y_ijkr is an observation of the response variable. i represents group i of factor A, i = 1, 2, ..., I, j represents group j of factor B, j = 1, 2, ..., J, k represents group k of factor C, and r represents the replication number, r = 1, 2, ..., R. For constant R, there are a total of N = I*J*K*R observations, but the number of observations does not have to be the same for each combination of groups of factors.
μ is the overall mean.
α_i are the deviations of groups of factor A from the overall mean μ due to factor A. The values of α_i sum to 0.

$\sum_{i = 1}^{I} α_{i} = 0.$
β_j are the deviations of groups in factor B from the overall mean μ due to factor B. The values of β_j sum to 0.

$\sum_{j = 1}^{J} β_{j} = 0.$
γ_k are the deviations of groups in factor C from the overall mean μ due to factor C. The values of γ_k sum to 0.

$\sum_{k = 1}^{K} γ_{k} = 0.$
(αβ)_ij is the interaction term between factors A and B. (αβ)_ij sum to 0 over either index.

$\sum_{i = 1}^{I} {(α β)}_{i j} = \sum_{j = 1}^{J} {(α β)}_{i j} = 0.$
(αγ)_ik is the interaction term between factors A and C. The values of (αγ)_ik sum to 0 over either index.

$\sum_{i = 1}^{I} {(α γ)}_{i k} = \sum_{k = 1}^{K} {(α γ)}_{i k} = 0.$
(βγ)_jk is the interaction term between factors B and C. The values of (βγ)_jk sum to 0 over either index.

$\sum_{j = 1}^{J} {(β γ)}_{j k} = \sum_{k = 1}^{K} {(β γ)}_{j k} = 0.$
(αβγ)_ijk is the three-way interaction term between factors A, B, and C. The values of (αβγ)_ijk sum to 0 over any index.

$\sum_{i = 1}^{I} {(α β γ)}_{i j k} = \sum_{j = 1}^{J} {(α β γ)}_{i j k} = \sum_{k = 1}^{K} {(α β γ)}_{i j k} = 0.$
ε_ijkr are the random disturbances. They are assumed to be independent, normally distributed, and have constant variance.

Three-way ANOVA tests hypotheses about the effects of factors A, B, C, and their interactions on the response variable y. The hypotheses about the equality of the mean responses for groups of factor A are

$\begin{array}{l} H_{0} : α_{1} = α_{2} \dots = α_{I} \\ H_{1} : at least one α_{i} is different, i = 1, 2, ..., I . \end{array}$

The hypotheses about the equality of the mean response for groups of factor B are

$\begin{array}{l} H_{0} : β_{1} = β_{2} = \dots = β_{J} \\ H_{1} : at least one β_{j} is different, j = 1, 2, ..., J . \end{array}$

The hypotheses about the equality of the mean response for groups of factor C are

$\begin{array}{l} H_{0} : γ_{1} = γ_{2} = \dots = γ_{K} \\ H_{1} : at least one γ_{k} is different, k = 1, 2, ..., K . \end{array}$

The hypotheses about the interaction of the factors are

$\begin{array}{l} H_{0} : {(α β)}_{i j} = 0 \\ H_{1} : at least one {(α β)}_{i j} \neq 0 \end{array}$

$\begin{array}{l} H_{0} : {(α γ)}_{i k} = 0 \\ H_{1} : at least one {(α γ)}_{i k} \neq 0 \\ H_{0} : {(β γ)}_{j k} = 0 \\ H_{1} : at least one {(β γ)}_{j k} \neq 0 \\ H_{0} : {(α β γ)}_{i j k} = 0 \\ H_{1} : at least one {(α β γ)}_{i j k} \neq 0 \end{array}$

In this notation parameters with two subscripts, such as (αβ)_ij, represent the interaction effect of two factors. The parameter (αβγ)_ijk represents the three-way interaction. An ANOVA model can have the full set of parameters or any subset, but conventionally it does not include complex interaction terms unless it also includes all simpler terms for those factors. For example, one would generally not include the three-way interaction without also including all two-way interactions.

Prepare Data for N-Way ANOVA

Unlike anova1 and anova2, anovan does not expect data in a tabular form. Instead, it expects a vector of response measurements and a separate vector (or text array) containing the values corresponding to each factor. This input data format is more convenient than matrices when there are more than two factors or when the number of measurements per factor combination is not constant.

$\begin{matrix} y & = & [ & y_{1}, & y_{2}, & y_{3}, & y_{4}, & y_{5}, & \dots, & y_{N} & ]^{'} \\ ↑ & ↑ & ↑ & ↑ & ↑ & ↑ \\ g 1 & = & { & ' A', & ' A', & ' C', & ' B', & ' B', & \dots, & ' D' & } \\ g 2 & = & [ & 1 & 2 & 1 & 3 & 1 & \dots, & 2 & ] \\ g 3 & = & { & ' hi', & ' mid', & ' low', & ' mid', & ' hi', & \dots, & ' low' & } \end{matrix}$

Perform N-Way ANOVA

Open Live Script

This example shows how to perform N-way ANOVA on car data with mileage and other information on 406 cars made between 1970 and 1982.

Load the sample data.

load carbig

The example focusses on four variables. MPG is the number of miles per gallon for each of 406 cars (though some have missing values coded as NaN). The other three variables are factors: cyl4 (four-cylinder car or not), org (car originated in Europe, Japan, or the USA), and when (car was built early in the period, in the middle of the period, or late in the period).

Fit the full model, requesting up to three-way interactions and Type 3 sums-of-squares.

varnames = {'Origin';'4Cyl';'MfgDate'};
anovan(MPG,{org cyl4 when},3,3,varnames);

Figure N-Way ANOVA contains objects of type uicontrol.

Note that many terms are marked by a # symbol as not having full rank, and one of them has zero degrees of freedom and is missing a p-value. This can happen when there are missing factor combinations and the model has higher-order terms. In this case, the cross-tabulation below shows that there are no cars made in Europe during the early part of the period with other than four cylinders, as indicated by the 0 in tbl(2,1,1).

[tbl,chi2,p,factorvals] = crosstab(org,when,cyl4)

tbl = 
tbl(:,:,1) =

    82    75    25
     0     4     3
     3     3     4


tbl(:,:,2) =

    12    22    38
    23    26    17
    12    25    32

chi2 = 
207.7689

p = 
8.0973e-38

factorvals=3×3 cell array
    {'USA'   }    {'Early'}    {'Other'   }
    {'Europe'}    {'Mid'  }    {'Four'    }
    {'Japan' }    {'Late' }    {0×0 double}

Consequently it is impossible to estimate the three-way interaction effects, and including the three-way interaction term in the model makes the fit singular.

Using even the limited information available in the ANOVA table, you can see that the three-way interaction has a p-value of 0.699, so it is not significant.

Examine only two-way interactions.

[p,tbl2,stats,terms] = anovan(MPG,{org cyl4 when},2,3,varnames);

Figure N-Way ANOVA contains objects of type uicontrol.

terms

terms = 6×3

     1     0     0
     0     1     0
     0     0     1
     1     1     0
     1     0     1
     0     1     1

Now all terms are estimable. The p-values for interaction term 4 (Origin*4Cyl) and interaction term 6 (4Cyl*MfgDate) are much larger than a typical cutoff value of 0.05, indicating these terms are not significant. You could choose to omit these terms and pool their effects into the error term. The output terms variable returns a matrix of codes, each of which is a bit pattern representing a term.

Omit terms from the model by deleting their entries from terms.

terms([4 6],:) = []

terms = 4×3

     1     0     0
     0     1     0
     0     0     1
     1     0     1

Run anovan again, this time supplying the resulting vector as the model argument. Also return the statistics required for multiple comparisons of factors.

[~,~,stats] = anovan(MPG,{org cyl4 when},terms,3,varnames)

Figure N-Way ANOVA contains objects of type uicontrol.

stats = struct with fields:
         source: 'anovan'
          resid: [3.1235 0.1235 3.1235 1.1235 2.1235 0.1235 -0.8765 -0.8765 -0.8765 0.1235 NaN NaN NaN NaN NaN 0.1235 -0.8765 NaN 0.1235 -0.8765 -2.3832 7.1235 3.1235 6.1235 0.6168 1.2857 0.2857 -0.7143 0.2857 1.2857 6.1235 -4.8765 … ] (1×406 double)
         coeffs: [18×1 double]
            Rtr: [10×10 double]
       rowbasis: [10×18 double]
            dfe: 388
            mse: 14.1056
    nullproject: [18×10 double]
          terms: [4×3 double]
        nlevels: [3×1 double]
     continuous: [0 0 0]
         vmeans: [3×1 double]
       termcols: [5×1 double]
     coeffnames: {18×1 cell}
           vars: [18×3 double]
       varnames: {3×1 cell}
       grpnames: {3×1 cell}
        vnested: []
            ems: [5×5 double]
          denom: []
        dfdenom: []
        msdenom: []
         varest: []
          varci: []
       txtdenom: []
         txtems: []
        rtnames: []

Now you have a more parsimonious model indicating that the mileage of these cars seems to be related to all three factors, and that the effect of the manufacturing date depends on where the car was made.

Perform multiple comparisons for Origin and Cylinder.

[results,~,~,gnames] = multcompare(stats,'Dimension',[1,2]);

Figure Multiple comparison of population marginal means contains an axes object and other objects of type uicontrol. The axes object with title Click on the group you want to test, xlabel 5 groups have population marginal means significantly different from Origin=USA,4Cyl=Other contains 13 objects of type line. One or more of the lines displays its values using only markers

Display the multiple comparison results and the corresponding group names in a table.

tbl = array2table(results,"VariableNames", ...
    ["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]);
tbl.("Group A") = gnames(tbl.("Group A"));
tbl.("Group B") = gnames(tbl.("Group B"))

tbl=15×6 table
              Group A                         Group B               Lower Limit      A-B      Upper Limit     P-value  
    ____________________________    ____________________________    ___________    _______    ___________    __________

    {'Origin=USA,4Cyl=Other'   }    {'Origin=Japan,4Cyl=Other' }      -5.4891      -3.8412      -2.1932      4.2334e-10
    {'Origin=USA,4Cyl=Other'   }    {'Origin=Europe,4Cyl=Other'}      -4.4146      -2.7251      -1.0356      6.2974e-05
    {'Origin=USA,4Cyl=Other'   }    {'Origin=USA,4Cyl=Four'    }      -9.9992      -8.5828      -7.1664               0
    {'Origin=USA,4Cyl=Other'   }    {'Origin=Japan,4Cyl=Four'  }      -14.024      -12.424      -10.824               0
    {'Origin=USA,4Cyl=Other'   }    {'Origin=Europe,4Cyl=Four' }      -12.898      -11.308       -9.718               0
    {'Origin=Japan,4Cyl=Other' }    {'Origin=Europe,4Cyl=Other'}     -0.71714        1.116       2.9492          0.5085
    {'Origin=Japan,4Cyl=Other' }    {'Origin=USA,4Cyl=Four'    }      -7.3655      -4.7417      -2.1179      3.8678e-06
    {'Origin=Japan,4Cyl=Other' }    {'Origin=Japan,4Cyl=Four'  }      -9.9992      -8.5828      -7.1664               0
    {'Origin=Japan,4Cyl=Other' }    {'Origin=Europe,4Cyl=Four' }      -9.7464      -7.4668      -5.1872      1.4557e-20
    {'Origin=Europe,4Cyl=Other'}    {'Origin=USA,4Cyl=Four'    }      -8.5396      -5.8577      -3.1757      6.9888e-09
    {'Origin=Europe,4Cyl=Other'}    {'Origin=Japan,4Cyl=Four'  }      -12.052      -9.6988      -7.3459               0
    {'Origin=Europe,4Cyl=Other'}    {'Origin=Europe,4Cyl=Four' }      -9.9992      -8.5828      -7.1664               0
    {'Origin=USA,4Cyl=Four'    }    {'Origin=Japan,4Cyl=Four'  }      -5.4891      -3.8412      -2.1932      4.2334e-10
    {'Origin=USA,4Cyl=Four'    }    {'Origin=Europe,4Cyl=Four' }      -4.4146      -2.7251      -1.0356      6.2974e-05
    {'Origin=Japan,4Cyl=Four'  }    {'Origin=Europe,4Cyl=Four' }     -0.71714        1.116       2.9492          0.5085

N-Way ANOVA

Introduction to N-Way ANOVA

Prepare Data for N-Way ANOVA

Perform N-Way ANOVA

See Also

Topics