# gelu

Apply Gaussian error linear unit (GELU) activation

## Syntax

``Y = gelu(X)``
``Y = gelu(X,Approximation=method)``

## Description

The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

This operation is given by

`$\text{GELU}\left(x\right)=\frac{x}{2}\left(1+\text{​}\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right),$`

where erf denotes the error function.

Note

This function applies the GELU operation to `dlarray` data. If you want to apply the GELU activation within a `layerGraph` object or `Layer` array, use the following layer:

example

````Y = gelu(X)` applies the GELU activation to the input data `X`.```
````Y = gelu(X,Approximation=method)` also specifies the approximation method for the GELU operation. For example, `Approximation="tanh"` specifies the tanh approximation of the underlying error function.```

## Examples

collapse all

Create a formatted `dlarray` object containing a batch of 128 28-by-28 images with three channels. Specify the format `"SSCB"` (spatial, spatial, channel, batch).

```miniBatchSize = 128; inputSize = [28 28]; numChannels = 3; X = rand(inputSize(1),inputSize(2),numChannels,miniBatchSize); X = dlarray(X,"SSCB");```

View the size and format of the input data.

`size(X)`
```ans = 1×4 28 28 3 128 ```
`dims(X)`
```ans = 'SSCB' ```

Apply the GELU activation.

`Y = gelu(X);`

View the size and format of the output.

`size(Y)`
```ans = 1×4 28 28 3 128 ```
`dims(Y)`
```ans = 'SSCB' ```

## Input Arguments

collapse all

Input data, specified as a formatted or unformatted `dlarray` object.

Approximation method, specified as one of these values:

• `"none"` — Do not use approximation.

• `"tanh"` — Approximate the underlying error function using

`$\text{erf}\left(\frac{x}{\sqrt{2}}\right)\approx \text{tanh}\left(\sqrt{\frac{2}{\pi }}\left(x+0.044715{x}^{3}\right)\right).$`

Tip

In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

## Output Arguments

collapse all

GELU activations, returned as a `dlarray` object. The output `Y` has the same underlying data type as the input `X`.

If the input data `X` is a formatted `dlarray` object, then `Y` has the same dimension format as `X`. If the input data is not a formatted `dlarray` object, then `Y` is an unformatted `dlarray` object with the same dimension order as the input data.

## Algorithms

collapse all

### Gaussian Error Linear Unit Activation

The Gaussian error linear unit (GELU) activation operation weights the input by its probability under a Gaussian distribution.

This operation is given by

`$\text{GELU}\left(x\right)=\frac{x}{2}\left(1+\text{​}\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right),$`

where erf denotes the error function given by

`$\text{erf}\left(x\right)=\frac{2}{\sqrt{\pi }}{\int }_{0}^{x}{e}^{-{t}^{2}}dt.$`

When the `Approximation` option is `"tanh"`, the software approximates the error function using

`$\text{erf}\left(\frac{x}{\sqrt{2}}\right)\approx \text{tanh}\left(\sqrt{\frac{2}{\pi }}\left(x+0.044715{x}^{3}\right)\right).$`

 Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415