# roialign

Non-quantized ROI pooling of `dlarray` data

Since R2021b

## Syntax

``dlY = roialign(dlX,boxes,outputSize)``
``dlY = roialign(dlX,boxes,outputSize,Name=Value)``

## Description

The ROI align operation pools a rectangular ROI into fixed sized bins without quantizing the grid points to the nearest pixel. The function uses bilinear interpolation to infer the value at each grid point.

Given input data of size [H W C N], where C is the number of channels and N is the number of observations, the pooled deep learning data has size [h w C `sum`(M)], where h and w are the specified output size. M is a vector of length N and M(i) is the number of ROIs associated with the i-th observation.

Note

To perform ROI pooling within a `layerGraph` (Deep Learning Toolbox) object or `Layer` (Deep Learning Toolbox) array, use `roiAlignLayer`.

This function requires Deep Learning Toolbox™.

example

````dlY = roialign(dlX,boxes,outputSize)` performs a pooling operation along the spatial dimensions of the input `X` for each bounding box in `boxes`. The outputs, `Y`, are of size `outputSize`.```
````dlY = roialign(dlX,boxes,outputSize,Name=Value)` specifies additional name-value arguments.```

## Examples

collapse all

Create a 4-D formatted `dlarray` object that simulates a batch of two RGB images.

`X = dlarray(rand(10,10,3,2),"SSCB");`

Specify the position and batch index of one bounding box.

```startXY = [2 2]; endXY = [4 4]; batchIdx = 1; rois = [startXY endXY batchIdx]';```

Perform ROI pooling with an output size of 3-by-3.

`Y = roialign(X,rois,[3 3])`
```Y = 3(S) x 3(S) x 3(C) x 1(B) single dlarray (:,:,1) = 0.7464 0.3069 0.1780 0.9212 0.8491 0.4677 0.7303 0.9057 0.3840 (:,:,2) = 0.3024 0.6428 0.6594 0.1542 0.0046 0.1228 0.6295 0.5182 0.3304 (:,:,3) = 0.4915 0.7590 0.5035 0.4574 0.4302 0.5453 0.2960 0.2666 0.5389 ```

## Input Arguments

collapse all

Deep learning data to pool, specified as a 4-D formatted `dlarray` (Deep Learning Toolbox) object with a data format of "SSCB".

Bounding boxes, specified as a 5-by-N numeric matrix, where N is the number of bounding boxes. Each bounding box is formatted as a column vector of the form [x_start; y_start; x_end; y_end; batchIdx], where:

• x_start and y_start specify the (x,y) coordinates of the upper-left corner of the rectangle.

• x_end and y_end specify the (x,y) coordinates of the bottom-right corner of the rectangle.

• batchIdx specifies the index of the observation corresponding to the rectangle.

By default, `boxes` are in the same coordinate space and scale as the input deep learning data `dlX`.

Pooled output size, specified as a vector of two positive integers ```[h w]```, where `h` is the height and `w` is the width.

### Name-Value Arguments

Specify optional pairs of arguments as `Name1=Value1,...,NameN=ValueN`, where `Name` is the argument name and `Value` is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: `dlY = roialign(dlX,boxes,outputSize,ROIScale=2)` scales the input ROIs by a factor of 2

Ratio of the scale of the input feature map to that of the ROI coordinates. This ratio specifies the factor used to scale input ROIs to the input feature map size.

Number of samples in each pooled bin, specified as `"auto"` or a row vector of two positive integers. The two elements are the number of vertical and horizontal samples, respectively.

If you do not specify the sampling ratio, then the number of vertical samples has the default value `ceil(roiHeight/outputHeight)`. Likewise, the number of horizontal samples has the default value `ceil(roiWidth/outputWidth)`.

Data Types: `double` | `char`

## Output Arguments

collapse all

Pooled deep learning data, returned as a 4-D formatted `dlarray` (Deep Learning Toolbox) object with a data format of "SSCB".

An ROI align operation returns fixed size feature maps for every rectangular ROI within an input `dlarray`. The function first partitions an ROI into fixed sized bins of size `OutputSize` without quantizing the grid points. Each bin is further sampled at `SamplingRatio` locations. The value at each sampled point is inferred using bilinear interpolation. The average of the sampled values is returned as the output value of each pooled bin.