# batchnorm

Normalize each channel of mini-batch

## Syntax

``[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor)``
``dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)``
``[dlY,datasetMu,datasetSigmaSq] = batchnorm(dlX,offset,scaleFactor,datasetMu,datasetSigmaSq)``
``[___] = batchnorm(___,'DataFormat',FMT)``
``[___] = batchnorm(___,Name,Value)``

## Description

The batch normalization operation normalizes each input channel across a mini-batch. To speed up training of convolutional neural networks and reduce the sensitivity to network initialization, use batch normalization between convolution and nonlinear operations such as `relu`.

Note

This function applies the batch normalization operation to `dlarray` data. If you want to apply batch normalization within a `layerGraph` object or `Layer` array, use the following layer:

example

````[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor)` normalizes each channel of the input mini-batch `dlX` using the mean and variance statistics computed from each channel and applies a scale factor and offset. The normalized activation is calculated using the following formula: ${\stackrel{^}{x}}_{i}=\frac{{x}_{i}-{\mu }_{c}}{\sqrt{{\sigma }_{c}^{2}+\epsilon }}$where xi is the input activation, μc (`mu`) and σc2 (`sigmaSq`) are the per-channel mean and variance, respectively, and ε is a small constant. `mu` and `sigmaSq` are calculated over all `'S'` (spatial), `'B'` (batch), `'T'` (time), and `'U'` (unspecified) dimensions in `dlX` for each channel. The normalized activation is offset and scaled according to the following formula: ${y}_{i}=\gamma {\stackrel{^}{x}}_{i}+\beta .$The offset β and scale factor γ are specified with the `offset` and `scaleFactor` arguments.The input `dlX` is a formatted `dlarray` with dimension labels. The output `dlY` is a formatted `dlarray` with the same dimension labels as `dlX`. ```
````dlY = batchnorm(dlX,offset,scaleFactor,mu,sigmaSq)` normalizes each channel of the input `dlX` using the specified `mu` and `sigmaSq` statistics and applies a scale factor and offset. ```

example

````[dlY,datasetMu,datasetSigmaSq] = batchnorm(dlX,offset,scaleFactor,datasetMu,datasetSigmaSq)` normalizes each channel of the input mini-batch `dlX` using the mean and variance statistics computed from each channel and applies a scale factor and offset. The function also updates the data set statistics `datasetMu` and `datasetSigmaSq` using the following formula:${s}_{n}=\varphi {s}_{x}+\left(1-\varphi \right){s}_{n-1}$where sn is the statistic computed over several mini-batches, sx is the per-channel statistic of the current mini-batch, and ϕ is the decay value for the statistic. Use this syntax to iteratively update the mean and variance statistics over several mini-batches of data during training. Use the final value of the mean and variance computed over all training mini-batches to normalize data for prediction and classification. ```
````[___] = batchnorm(___,'DataFormat',FMT)` also specifies the dimension format `FMT` when `dlX` is not a formatted `dlarray` in addition to the input arguments in previous syntaxes. The output `dlY` is an unformatted `dlarray` with the same dimension order as `dlX`.```
````[___] = batchnorm(___,Name,Value)` specifies options using one or more name-value pair arguments in addition to the input arguments in previous syntaxes. For example, `'MeanDecay',3` sets the decay rate of the moving average computation. ```

## Examples

collapse all

Use `batchnorm` to normalize each channel of a mini-batch and obtain the per-channel normalization statistics.

Create the input data as a single observation of random values with a height and width of four and three channels.

```height = 4; width = 4; channels = 3; observations = 1; X = rand(height,width,channels,observations); dlX = dlarray(X,'SSCB');```

Create the learnable parameters.

```offset = zeros(channels,1); scaleFactor = ones(channels,1);```

Compute the batch normalization and obtain the statistics of each channel of the batch.

```[dlY,mu,sigmaSq] = batchnorm(dlX,offset,scaleFactor); mu sigmaSq```
```mu = 3×1 0.6095 0.6063 0.4619 sigmaSq = 3×1 0.1128 0.0880 0.0805 ```

Use the `batchnorm` function to normalize several batches of data and update the statistics of the whole data set after each normalization.

Create three batches of data. The data consists of 10-by-10 random arrays with five channels. Each batch contains 20 observations. The second and third batches are scaled by a multiplicative factor of `1.5` and `2.5`, respectively, so the mean of the data set increases with each batch.

```height = 10; width = 10; channels = 5; observations = 20; X1 = rand(height,width,channels,observations); dlX1 = dlarray(X1,'SSCB'); X2 = 1.5*rand(height,width,channels,observations); dlX2 = dlarray(X2,'SSCB'); X3 = 2.5*rand(height,width,channels,observations); dlX3 = dlarray(X3,'SSCB');```

Create the learnable parameters.

```offset = zeros(channels,1); scale = ones(channels,1);```

Normalize the first batch of data, dlX1, using `batchnorm`. Obtain the values of the mean and variance of this batch as outputs.

`[dlY1,mu,sigmaSq] = batchnorm(dlX1,offset,scale);`

Normalize the second batch of data, `dlX2`. Use `mu` and `sigmaSq` as inputs to obtain the values of the combined mean and variance of the data in batches `dlX1` and `dlX2`.

`[dlY2,datasetMu,datasetSigmaSq] = batchnorm(dlX2,offset,scale,mu,sigmaSq);`

Normalize the final batch of data, `dlX3`. Update the data set statistics `datasetMu` and `datasetSigmaSq` to obtain the values of the combined mean and variance of all data in batches `dlX1`, `dlX2`, and `dlX3`.

`[dlY3,datasetMuFull,datasetSigmaSqFull] = batchnorm(dlX3,offset,scale,datasetMu,datasetSigmaSq);`

Observe the change in the mean of each channel as each batch is normalized.

```plot([mu';datasetMu';datasetMuFull']) legend({'Channel 1','Channel 2','Channel 3','Channel 4','Channel 5'},'Location','southeast') xticks([1 2 3]) xlabel('Number of Batches') xlim([0.9 3.1]) ylabel('Per-Channel Mean') title('Data Set Mean')```

## Input Arguments

collapse all

Input data, specified as a `dlarray` with or without dimension labels or a numeric array. When `dlX` is not a formatted `dlarray`, you must specify the dimension label format using `'DataFormat',FMT`. If `dlX` is a numeric array, at least one of `offset` or `scaleFactor` must be a `dlarray`.

`dlX` must have a `'C'` channel dimension.

Data Types: `single` | `double`

Channel offset β, specified as a `dlarray` vector with or without dimension labels or a numeric vector.

If `offset` is a formatted `dlarray`, it must contain a `'C'` dimension of the same size as the `'C'` dimension of the input data.

Data Types: `single` | `double`

Channel scale factor γ, specified as a `dlarray` vector with or without dimension labels or a numeric vector.

If `scaleFactor` is a formatted `dlarray`, it must contain a `'C'` dimension of the same size as the `'C'` dimension of the input data.

Data Types: `single` | `double`

Mean statistic for normalization, specified as a numeric vector of the same length as the `'C'` dimension of the input data.

`mu` is calculated over all `'S'` (spatial), `'B'` (batch), `'T'` (time), and `'U'` (unspecified) dimensions in `dlX` for each channel.

Data Types: `single` | `double`

Variance statistic for normalization, specified as a numeric vector of the same length as the `'C'` dimension of the input data.

`sigmaSq` is calculated over all `'S'` (spatial), `'B'` (batch), `'T'` (time), and `'U'` (unspecified) dimensions in `dlX` for each channel.

Data Types: `single` | `double`

Mean statistic of several batches of data, specified as a numeric vector of the same length as the `'C'` dimension of the input data. To iteratively update the dataset mean over several batches of input data, use the `datasetMu` output of a previous call to `batchnorm` as the `datasetMu` input argument.

Data Types: `single` | `double`

Variance statistic of several batches of data, specified as a numeric vector of the same length as the `'C'` dimension of the input data. To iteratively update the dataset variance over several batches of input data, use the `datasetSigmaSq` output of a previous call to `batchnorm` as the `datasetSigmaSq` input argument.

Data Types: `single` | `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'MeanDecay',0.3,'MeanVariance',0.5` sets the decay rate for the moving average computations of the mean and variance of several batches of data to `0.3` and `0.5`, respectively.

Dimension order of unformatted input data, specified as the comma-separated pair consisting of `'DataFormat'` and a character array or string `FMT` that provides a label for each dimension of the data. Each character in `FMT` must be one of the following:

• `'S'` — Spatial

• `'C'` — Channel

• `'B'` — Batch (for example, samples and observations)

• `'T'` — Time (for example, sequences)

• `'U'` — Unspecified

You can specify multiple dimensions labeled `'S'` or `'U'`. You can use the labels `'C'`, `'B'`, and `'T'` at most once.

You must specify `'DataFormat'` when the input data `dlX` is not a formatted `dlarray`.

Example: `'DataFormat','SSCB'`

Data Types: `char` | `string`

Variance offset for preventing divide-by-zero errors, specified as the comma-separated pair consisting of `'Epsilon'` and a numeric scalar. The specified value must be greater than `1e-5`. The default value is `1e-5`.

Data Types: `single` | `double`

Decay value for the moving average computation of the `datasetMu` output, specified as the comma-separated pair consisting of `'MeanDecay'` and a numeric scalar between `0` and `1`. The default value is `0.1`.

Data Types: `single` | `double`

Decay value for the moving average computation of the `datasetSigmaSq` output, specified as the comma-separated pair consisting of `'VarianceDecay'` and a numeric scalar between `0` and `1`. The default value is `0.1`.

Data Types: `single` | `double`

## Output Arguments

collapse all

Normalized data, returned as a `dlarray`. The output `dlY` has the same underlying data type as the input `dlX`.

If the input data `dlX` is a formatted `dlarray`, `dlY` has the same dimension labels as `dlX`. If the input data is not a formatted `dlarray`, `dlY` is an unformatted `dlarray` with the same dimension order as the input data.

Per-channel mean of the input data, returned as a numeric column vector with length equal to the size of the `'C'` dimension of the input data.

Per-channel variance of the input data, returned as a numeric column vector with length equal to the size of the `'C'` dimension of the input data.

Updated mean statistic of several batches of data, returned as a numeric vector with length equal to the size of the `'C'` dimension of the input data. `datasetMu` is returned with the same shape as the input `datasetMu`.

The `datasetMu` output is the moving average computation of the mean statistic for each channel over several batches of input data. `datasetMu` is computed from the channel mean of the input data and the input `datasetMu` using the following formula:

`datasetMu` = `meanDecay` × `currentMu` + (1 – `meanDecay`) × `datasetMu`,

where `currentMu` is the channel mean computed from the input data and the value of `meanDecay` is specified using the `'MeanDecay'` name-value pair argument.

Updated variance statistic of several batches of data, returned as a numeric vector with length equal to the size of the `'C'` dimension of the input data. `datasetSigmaSq` is returned with the same shape as the input `datasetSigmaSq`.

The `datasetSigmaSq` output is the moving average computation of the variance statistic for each channel over several batches of input data. `datasetSigmaSq` is computed from the channel variance of the input data and the input `datasetSigmaSq` using the following formula:

`datasetSigmaSq` = `varianceDecay` × `currentSigmaSq` + (1 – `varianceDecay`) × `datasetSigmaSq`,

where `currentSigmaSq` is the channel variance computed from the input data and the value of `varianceDecay` is specified using the `'VarianceDecay'` name-value pair.

## More About

collapse all

### Batch Normalization

The `batchnorm` function normalizes each input channel of a mini-batch of data. For more information, see the definition of Batch Normalization Layer on the `batchNormalizationLayer` reference page.

## Extended Capabilities

Introduced in R2019b

Download ebook