Main Content

Set Up Multivariate Regression Problems

Response Matrix

To fit a multivariate linear regression model using mvregress, you must set up your response matrix and design matrices in a particular way. Given properly formatted inputs, mvregress can handle a variety of multivariate regression problems.

mvregress expects the n observations of potentially correlated d-dimensional responses to be in an n-by-d matrix, named Y, for example. That is, set up your responses so that the dependency structure is between observations in the same row. If you specify Y as a vector of length n (either a row or column vector), then mvregress assumes that d = 1, and treats the elements as n independent observations. It does not model the vector as one realization of a correlated series (such as a time series).

To illustrate how to set up a response matrix, suppose that your multivariate responses are repeated measurements made on subjects at multiple time points, as in the following figure.

Plot of repeated measurements, where each line corresponds to one subject. The x-axis shows the time points at which the measurements are made.

Suppose that observations within a subject are correlated.

Plot of repeated measurements, where the dark blue points indicate within subject correlation

In this case, set up the response matrix Y such that each row corresponds to a subject, and each column corresponds to a time point.

Response matrix with subjects in rows and time points in columns

Then again, suppose that observations made on subjects at the same time are correlated (concurrent correlation).

Plot of repeated measurements, where the dark blue points indicate between subject correlation

In this case, set up the response matrix Y such that each row corresponds to a time point, and each column corresponds to a subject.

Response matrix with time points in rows and subjects in columns

Design Matrices

In the multivariate linear regression model, each d-dimensional response has a corresponding design matrix. Depending on the model, the design matrix might be comprised of exogenous predictor variables, dummy variables, lagged responses, or a combination of these and other covariate terms.

  • If d > 1 and all d dimensions have the same design matrix, then specify one n-by-p design matrix, where p is the number of predictor variables. To determine an intercept for each dimension, add a column of ones to the design matrix. In this case, mvregress applies the design matrix to all d dimensions.

  • If d > 1 and all d dimensions do not have the same design matrix, then specify the design matrices using a length-n cell array of d-by-K arrays, named X, for example. K is the total number of regression coefficients in the model. Note that the rows of the arrays in X correspond to the columns of the response matrix, Y.

    Cell array of design matrices

    If all n observations have the same design matrix, you can specify a cell array containing one d-by-K design matrix. In this case, mvregress applies the design matrix to all n observations. For example, this situation might arise if the predictors are functions of time, and all observations were measured at the same time points.

  • In the special case that d = 1, you can specify one n-by-K design matrix (not in a cell array). However, you should consider using fitlm to fit regression models to univariate, continuous responses.

The following sections illustrate how to set up the some common multivariate regression problems for estimation using mvregress.

Common Multivariate Regression Problems

Multivariate General Linear Model

The multivariate general linear model is of the form

Yn×d=Xn×(p+1)B(p+1)×d+En×d.

In expanded form,

[y11y12y1dy21y22y2dyn1yn2ynd]=[1x11x12x1p1x21x22x2p1xn1xn2xnp][β01β02β0dβ11β12β1dβp1βp2βpd]+[ε11ε12ε1dε21ε22ε2dεn1εn2εnd].

That is, each d-dimensional response has an intercept and p predictor variables, and each dimension has its own set of regression coefficients. In this form, the least squares solution is B = X\Y. To estimate this model using mvregress, use the n-by-d matrix of responses, as above.

If all d dimensions have the same design matrix, use the n-by-(p+1) design matrix, as above. Adding a column of ones to the p predictor variables computes the intercept for each dimension.

If all d dimensions do not have the same design matrix, reformat the n-by-(p + 1) design matrix into a length-n cell array of d-by-K matrices. Here, K = (p + 1)d for an intercept and slopes for each dimension.

For example, suppose n = 4, d = 3, and p = 2 (two predictor terms in addition to an intercept). This figure shows how to format the ith element in the cell array.

[y11y12y13y21y31y22y32y23y33y41y42y43]=[1x11x1211x21x31x22x321x41x42][β01β02β03β11β12β13β21β22β23][100010001xi1000xi1000xi1xi2000xi2000xi2]X{i}[β01β02β03β11β12β13β21β22β23]+[ε11ε12ε13ε21ε31ε22ε32ε23ε33ε41ε42ε43]

If you prefer, you can reshape the K-by-1 vector of coefficients back into a (p + 1)-by-d matrix after estimation.

To put constraints on the model parameters, adjust the design matrix accordingly. For example, suppose that the three dimensions in the previous example have a common slope. That is, β11=β12=β13=β1 and β21=β22=β23=β2. In this case, each design matrix is 3-by-5, as shown in the following figure.

[100xi1xi2010xi1xi2001xi1xi2]X{i}[β01β02β03β1β2]

Longitudinal Analysis

In a longitudinal analysis, you might measure responses on n subjects at d time points, with correlation between observations made on the same subject. For example, suppose that you measure responses yij at times tij, i = 1,...,n and j = 1,...,d. In addition, suppose that each subject is in one of two groups (such as male or female), specified by the indicator variable Gi. You could model yij as a function of Gi and tij, with group-specific intercepts and slopes, as follows:

yij=β0+β1Gi+β2tij+β3Gi×tij+εij,i=1,,n;j=1,,d,

where

εi=(εi1,,εid)MVN(0,Σ).

Most longitudinal models include time as an explicit predictor.

To fit this model using mvregress, arrange the responses in an n-by-d matrix, where n is the number of subjects and d is the number of time points. Specify the design matrices in an n-length cell array of d-by-K matrices, where here K = 4 for the four regression coefficients.

For example, suppose d = 5 (five observations per subject). The ith design matrix and corresponding parameter vector for the specified model are shown in the following figure.

[1Giti1Gi×ti11Giti2Gi×ti21Giti3Gi×ti311GiGiti4ti5Gi×ti4Gi×ti5]X{i}[β0β1β2β3]

Panel Analysis

In a panel analysis, you might measure responses and covariates on d subjects (such as individuals or countries) at n time points. For example, suppose you measure responses ytj and covariates xtj on subjects j = 1,...,d at times t = 1,...,n. A fixed effects panel model, with subject-specific fixed effects, and concurrent correlation might look like:

ytj=αj+βxtj+εtj,

where

εt=(εt1,...,εtd)MVN(0,Σ).

In contrast to longitudinal models, the panel analysis model typically includes covariates measured at each time point, instead of using time as an explicit predictor.

To fit this model using mvregress, arrange the responses in an n-by-d matrix, such that each column corresponds to a subject. Specify the design matrices in an n-length cell array of d-by-K matrices, where here K = d + 1 for the d intercepts and a slope term.

For example, suppose d = 4 (four subjects). The tth design matrix and corresponding parameter vector are shown in the following figure.

[1000xt10100xt20010xt30001xt4]X{t}[α1α2α3α4β]

Seemingly Unrelated Regression

In a seemingly unrelated regression (SUR), you model d separate regressions, each with its own intercept and slope, but a common error variance-covariance matrix. For example, suppose you measure responses yij and covariates xij for regression models j = 1,...,d, with i = 1,...,n observations to fit each regression. The SUR model might look like:

yij=β0j+βjxij+εij,

where

εi=(εi1,,εid)MVN(0,Σ).

This model is very similar to the multivariate general linear model, except that it has different covariates for each dimension.

To fit this model using mvregress, arrange the responses in an n-by-d matrix, such that each column has the data for the jth regression model. Specify the design matrices in an n-length cell array of d-by-K matrices, where here K = 2d for d intercepts and d slopes.

For example, suppose d = 3 (three regressions). The ith design matrix and corresponding parameter vector are shown in the following figure.

[100010001xi1000xi2000xi3]X{i}[β01β02β03β1β2β3]

Vector Autoregressive Model

The VAR(p) vector autoregressive model expresses d-dimensional time series responses as a linear function of p lagged d-dimensional responses from previous times. For example, suppose you measure responses ytj for time series j = 1,...,d at times t = 1,...,n. The VAR(p) model might look like:

[yt1yt2ytd]=[c1c2cd]+[φ11(1)φ12(1)φ1d(1)φd1(1)φd2(1)φdd(1)][yt1,1yt1,2yt1,d]++[φ11(p)φ12(p)φ1d(p)φd1(p)φd2(p)φdd(p)][ytp,1ytp,2ytp,d]+[εt1εt2εtd],

where

εt=(εt1,...,εtd)MVN(0,Σ).

When estimating vector autoregressive models, you typically need to use the first p observations to initiate the model, or provide some other presample response values.

To fit this model using mvregress, arrange the responses in an n-by-d matrix, such that each column corresponds to a time series. Specify the design matrices in an n-length cell array of d-by-K matrices, where here K = d + pd2.

For example, suppose d = 2 (two time series) and p = 1 (one lag). The tth design matrix and corresponding parameter vector are shown in the following figure.

[1001yt1,100yt1,1yt1,200yt1,2]X{t}[c1c2φ11(1)φ21(1)φ12(1)φ22(1)]

Alternatively, Econometrics Toolbox™ has functions for fitting and forecasting VAR(p) models, including the option to specify exogenous predictor variables.

See Also

|

Related Examples

More About