Main Content

x2fx

Convert predictor matrix to design matrix

Description

D = x2fx(X,modelspec) converts a matrix of predictors X using the model terms modelspec into a design matrix D for regression analysis.

example

D = x2fx(X,modelspec,CategoricalPredictors) treats the columns in X with indices listed in CategoricalPredictors as categorical predictors.

example

D = x2fx(X,modelspec,CategoricalPredictors,CategoricalLevels) specifies the number of levels for each categorical predictor in CategoricalPredictors. If you specify CategoricalLevels, the values in each corresponding column of X must be integers in the range from 1 to the specified number of levels. X does not need to include every level value.

example

Examples

collapse all

Convert two predictors in the columns of a matrix X into a design matrix for a full quadratic model with the terms constant, X1, X2, X1.*X2, X1.^2, and X2.^2.

X = [1 10; 4 20; 8 10; 16 20]
X = 4×2

     1    10
     4    20
     8    10
    16    20

D = x2fx(X,"quadratic")
D = 4×6

     1     1    10    10     1   100
     1     4    20    80    16   400
     1     8    10    80    64   100
     1    16    20   320   256   400

Create a matrix X that contains five observations of two categorical predictors.

X = [1 1; 1 1; 1 2; 2 1; 2 2]
X = 5×2

     1     1
     1     1
     1     2
     2     1
     2     2

Convert X into a design matrix. Use a linear model and specify the two predictors as categorical.

D = x2fx(X,"linear",[1 2])
D = 5×3

     1     1     1
     1     1     1
     1     1     0
     1     0     1
     1     0     0

Create a new design matrix using a linear model where both predictors have three levels.

D = x2fx(X,"linear",[1 2],[3 3])
D = 5×5

     1     1     0     1     0
     1     1     0     1     0
     1     1     0     0     1
     1     0     1     1     0
     1     0     1     0     1

The design matrix now has five columns because both predictors have an extra level.

Input Arguments

collapse all

Predictor variables, specified as an n-by-p matrix, where n is the number of observations and p is the number of predictor variables. Each column of X represents one variable, and each row represents one observation.

Data Types: single | double

Model terms, specified as a value in the following table or as a numeric matrix.

ValueModel Contents
"linear" or "additive" (default)Constant and linear terms
"interactions"Constant, linear, and interaction terms
"quadratic"Constant, linear, interaction, and squared terms
"purequadratic"Constant, linear, and squared terms

If you specify modelspec as a numeric matrix, it must contain one column for each predictor and one row for each polynomial term in the model. The entries in each row are exponents for the predictors in the columns of X. For example, if a model has predictors X1, X2, and X3, then row [0 1 2] in modelspec specifies the term X10X21X32. A row of all zeros in modelspec specifies a constant term, which can be omitted.

Data Types: single | double | char | string

Indices of the categorical predictors in X, specified as a numeric vector of positive integers. Terms involving categorical predictors produce dummy variable columns in D. The function computes dummy variables with the assumption that possible categorical levels are completely enumerated by the unique values that appear in the corresponding columns of X.

Data Types: single | double

Number of levels for the categorical predictors, specified as a vector or positive integers with the same length as CategoricalPredictors.

Data Types: single | double

Output Arguments

collapse all

Design matrix, returned as a numeric matrix with the same number of rows as X. The number of columns in D depends on the value of modelspec, the number of predictors in X, and the number of levels for each categorical predictor in X.

Assuming that X has p columns, and you specify modelspec as "quadratic" or a numeric matrix that includes constant, linear, interaction, and squared terms, the columns of D (in order) are:

  1. Constant term

  2. Linear terms in the order 1, 2, ..., p

  3. Interaction terms in the order (1, 2), (1, 3), ..., (1, p), (2, 3), ..., (p – 1, p)

  4. Squared terms in the order 1, 2, ..., p

If you specify any other named value for modelspec, then D contains a subset of these terms, in the same order.

Version History

Introduced before R2006a