Normalize each channel of mini-batch

The batch normalization operation normalizes each input channel
across a mini-batch. To speed up training of convolutional neural networks and reduce the
sensitivity to network initialization, use batch normalization between convolution and nonlinear
operations such as `relu`

.

**Note**

This function applies the batch normalization operation to `dlarray`

data. If
you want to apply batch normalization within a `layerGraph`

object
or `Layer`

array, use
the following layer:

`[`

normalizes each channel of the input mini-batch `dlY`

,`mu`

,`sigmaSq`

] = batchnorm(`dlX`

,`offset`

,`scaleFactor`

)`dlX`

using the mean and
variance statistics computed from each channel and applies a scale factor and offset.

The normalized activation is calculated using the following formula:

$${\widehat{x}}_{i}=\frac{{x}_{i}-{\mu}_{c}}{\sqrt{{\sigma}_{c}^{2}+\epsilon}}$$

where *x _{i}* is the input activation,

`mu`

) and
`sigmaSq`

) are the per-channel mean and variance, respectively, and
`mu`

and
`sigmaSq`

are calculated over all `'S'`

(spatial),
`'B'`

(batch), `'T'`

(time), and `'U'`

(unspecified) dimensions in `dlX`

for each channel. The normalized activation is offset and scaled according to the following formula:

$${y}_{i}=\gamma {\widehat{x}}_{i}+\beta .$$

The offset *β* and scale factor *γ* are specified with
the `offset`

and `scaleFactor`

arguments.

The input `dlX`

is a formatted `dlarray`

with
dimension labels. The output `dlY`

is a formatted
`dlarray`

with the same dimension labels as `dlX`

.

`[`

normalizes each channel of the input mini-batch `dlY`

,`datasetMu`

,`datasetSigmaSq`

] = batchnorm(`dlX`

,`offset`

,`scaleFactor`

,`datasetMu`

,`datasetSigmaSq`

)`dlX`

using the mean and
variance statistics computed from each channel and applies a scale factor and offset. The
function also updates the data set statistics `datasetMu`

and
`datasetSigmaSq`

using the following formula:

$${s}_{n}=\varphi {s}_{x}+(1-\varphi ){s}_{n-1}$$

where *s _{n}* is the statistic computed over
several mini-batches,

Use this syntax to iteratively update the mean and variance statistics over several mini-batches of data during training. Use the final value of the mean and variance computed over all training mini-batches to normalize data for prediction and classification.

`[___] = batchnorm(___,'DataFormat',FMT)`

also specifies the dimension format `FMT`

when `dlX`

is
not a formatted `dlarray`

in addition to the input arguments in previous
syntaxes. The output `dlY`

is an unformatted `dlarray`

with the same dimension order as `dlX`

.

`[___] = batchnorm(___,`

specifies options using one or more name-value pair arguments in addition to the input
arguments in previous syntaxes. For example, `Name,Value`

)`'MeanDecay',3`

sets the decay
rate of the moving average computation.

`dlarray`

| `dlconv`

| `dlfeval`

| `dlgradient`

| `fullyconnect`

| `groupnorm`

| `relu`