Main Content

# dummyvar

Create dummy variables

## Syntax

``D = dummyvar(group)``

## Description

example

````D = dummyvar(group)` returns a matrix `D` containing zeros and ones, whose columns are dummy variables for the grouping variables in `group`. Each column of `group` is a single grouping variable, with values indicating category levels. The rows of `group` represent observations across all variables.```

## Examples

collapse all

Create a column vector of categorical data specifying color types.

```Colors = {'Red';'Blue';'Green';'Red';'Green';'Blue'}; Colors = categorical(Colors);```

Create dummy variables for each color type.

`D = dummyvar(Colors)`
```D = 6×3 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 1 0 0 ```

The columns in `D` correspond to the levels in `Colors`. For example, the first column of `dummyvar` corresponds to the first level, `'Blue'`, in `Colors`.

Display the category levels of `Colors`.

`categories(Colors)`
```ans = 3x1 cell {'Blue' } {'Green'} {'Red' } ```

Create a matrix `group` of data containing the effects of two machines and three operators on a process.

```machine = [1 1 1 1 2 2 2 2]'; operator = [1 2 3 1 2 3 1 2]'; group = [machine operator]```
```group = 8×2 1 1 1 2 1 3 1 1 2 2 2 3 2 1 2 2 ```

Create dummy variables of the data in `group`.

`D = dummyvar(group)`
```D = 8×5 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 1 0 0 0 1 0 1 0 ```

The first two columns of `D` represent observations of machine 1 and machine 2, respectively. The remaining columns represent observations of the three operators.

Create a cell array of phone types and a numeric vector of area codes.

```phone = {'mobile';'landline';'mobile';'mobile';'mobile';'landline';'landline'}; codes = [802 802 603 603 802 603 802]';```

Because the area code data has two levels (rather than 802 levels corresponding to the integers `1:802`), convert `codes` to a categorical vector.

`newcodes = categorical(codes);`

Combine the `phone` and `newcodes` grouping variables into the cell array `group`.

`group = {phone,newcodes};`

Create dummy variables for the groups in `group`.

`D = dummyvar(group)`
```D = 7×4 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 ```

The first two columns of `D` correspond to the phone types, and the last two columns correspond to the area codes.

Create dummy variables, and then decode them back into the original data.

Create a column vector of categorical data specifying color types.

```colorsOriginal = ["red";"blue";"red";"green";"yellow";"blue"]; colorsOriginal = categorical(colorsOriginal)```
```colorsOriginal = 6x1 categorical red blue red green yellow blue ```

Determine the classes in the categorical vector.

`classes = categories(colorsOriginal);`

Create dummy variables for each color type by using the `dummyvar` function.

`dummyColors = dummyvar(colorsOriginal)`
```dummyColors = 6×4 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 ```

Decode the dummy variables in the second dimension by using the `onehotdecode` function.

`colorsDecoded = onehotdecode(dummyColors,classes,2)`
```colorsDecoded = 6x1 categorical red blue red green yellow blue ```

The decoded variables match the original color types.

## Input Arguments

collapse all

Grouping variables, specified as a positive integer vector or categorical column vector representing levels within a single variable, a cell array containing one or more grouping variables, or a positive integer matrix representing levels within multiple variables.

If `group` is a categorical vector, then the groups and their order match the output of the `categories` function applied to `group`. If `group` is a numeric vector, then `dummyvar` assumes that the groups and their order are `1:max(group)`. In this respect, `dummyvar` treats a numeric grouping variable differently from `grp2idx`. For information on the order of groups within grouping variables, see Grouping Variables.

Example: `[2 1 1 1 2 3 3 2]'`

Example: `{Origin,Cylinders}`

Data Types: `single` | `double` | `categorical` | `cell`

## Output Arguments

collapse all

Dummy variables, returned as an n-by-s numeric matrix, where n is the number of rows of `group` and s is the sum of the number of levels in each column of `group`. From left to right, the columns of `D` are dummy variables created from the first column of `group`, followed by dummy variables created from the second column of `group`, and so on.

Data Types: `single` | `double`

## Tips

• Use dummy variables in regression analysis and ANOVA to indicate values of categorical predictors.

• `dummyvar` treats `NaN` values and undefined categorical levels in `group` as missing data and returns `NaN` values in `D`.

• If a column of ones is introduced in the matrix `D`, then the resulting matrix `X = [ones(size(D,1),1) D]` is rank deficient. If `group` has multiple columns, then the matrix `D` itself is rank deficient because dummy variables produced from any column of `group` always sum to a column of ones. Regression and ANOVA calculations often address this issue by eliminating one dummy variable (implicitly setting the coefficients for dropped columns to zero) from each group of dummy variables produced by a column of `group`.

• If `group` is a numeric vector with levels that do not correspond exactly to the integers `1:max(group)`, first convert the data to a categorical vector by using `categorical`. You can then pass the result to `dummyvar`. For an example, see Create Dummy Variables from Multiple Grouping Variables.

## Alternative Functionality

Alternatively, use `onehotencode` to encode data labels. Consider using `onehotencode` instead of `dummyvar` in these cases:

• To encode a table of categorical data labels

• To specify the dimension to expand for encoding the data labels

## Version History

Introduced before R2006a