predictPose

Estimate object pose using Pose Mask R-CNN deep learning network

Since R2024a

collapse all in page

Syntax

poses = predictPose(net,I,depthImage,intrinsics)

[poses,labels,scores,bboxes] = predictPose(___)

[poses,labels,scores,bboxes,masks] = predictPose(___)

[___] = predictPose(___,Name=Value)

Description

poses = predictPose(net,I,depthImage,intrinsics) returns 6-degrees-of-freedom (6-DoF) pose of objects within a single image or a batch of images I using a trained Pose Mask R-CNN network.

[poses,labels,scores,bboxes] = predictPose(___) also returns the labels assigned to the detected objects, the detection score for each detected object, and the bounding box location of each detected object, using the input arguments from the previous syntax.

[poses,labels,scores,bboxes,masks] = predictPose(___) performs instance segmentation of the objects, and returns the binary object masks, masks.

[___] = predictPose(___,Name=Value) specifies options using additional name-value arguments. For example, Threshold=0.7 specifies the detection threshold as 0.7.

Note

This functionality requires Deep Learning Toolbox™ and the Computer Vision Toolbox™ Model for Pose Mask R-CNN 6-DoF Object Pose Estimation. You can install the Computer Vision Toolbox Model for Pose Mask R-CNN 6-DoF Object Pose Estimation from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons.

Input Arguments

collapse all

`net` — Pose Mask R-CNN pose estimation network
`posemaskrcnn` object

Pose Mask R-CNN pose estimation network, specified as a posemaskrcnn object.

`I` — Image or batch of images
numeric matrix | numeric array

Image to segment, specified as a numeric matrix or array. The size of this argument depends on the number of images specified and whether they are color or grayscale.

Image Type and Number	Data Format
Single grayscale image	2-D matrix of size H-by-W
Single color image	3-D array of size H-by-W-by-3.
Batch of B grayscale or color images	4-D array of size H-by-W-by-C-by-B. The number of color channels C is 1 for grayscale images and 3 for color images.

The height H and width W of each image must be greater than or equal to the input height h and width w of the network.

Tip

For best network performance, use input image data of the same size that the network has been trained on.

`depthImage` — Depth map for estimating 3-D pose
numeric matrix | numeric array

Depth map for estimating the 3-D pose, specified as a numeric matrix or array. The size of this argument depends on the number of depth maps specified.

Number of Depth Maps	Data Format
Single depth map	2-D numeric matrix of size H-by-W.
Batch of B depth maps	4-D array of size H-by-W-by-1-by-B.

H and W must be equal to the corresponding values of I, and the number of depth maps must match the number of images specified.

`intrinsics` — Camera intrinsic parameters
`cameraIntrinsics` object | B-by-1 cell array of `cameraIntrinsics` objects

Camera intrinsic parameters, specified as a cameraIntrinsics object or a B-by-1 cell array of cameraIntrinsics objects. If you specify this value as a scalar, the function applies the same camera intrinsic parameters to every input image. If you specify this value as a cell array, the number of cameraIntrinsics objects must match the number of input images B.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: predictPose(net,I,depthImage,intrinsics,Threshold=0.7) specifies the detection threshold as 0.7.

Options for All Image Formats

collapse all

`Threshold` — Object detection threshold
`0.5` (default) | numeric scalar

Object detection threshold, specified as a numeric scalar in the range [0, 1]. The Pose Mask R-CNN network does not use object detections with scores less than the threshold value for pose estimation. Increase this value to reduce false positives.

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

Maximum number of strongest region proposals, specified as a positive integer or Inf. Reduce this value to increase processing speed at the cost of detection accuracy. To use all region proposals, specify this value as Inf.

`SelectStrongestMulticlassThreshold` — Strongest bounding box threshold per class
`0.5` (default) | numeric scalar

Strongest bounding box threshold per class, specified as a positive numeric scalar in the range [0, 1]. The strongest bounding boxes per class are returned when their confidence scores are higher than this value. To select these boxes, predictPose uses the selectStrongestBboxMulticlass function, which uses nonmaximal suppression to eliminate overlapping bounding boxes with the same class label based on their confidence scores.

Note

predictPose first performs nonmaximal suppression using the selectStrongestBboxMulticlass function and the threshold value specified by SelectStrongestMulticlassThreshold. In the second step, predictPose performs nonmaximal suppression using the selectStrongestBbox function and the threshold value specified by SelectStrongestThreshold. You can manually determine the optimal threshold values for your application by using a validation data subset when training on a custom data set.

`SelectStrongestThreshold` — Strongest bounding box threshold per object
`0.3` (default) | numeric scalar

Strongest bounding box threshold per object, specified as a positive numeric scalar in the range [0, 1]. The strongest bounding boxes per object are returned when their confidence scores are higher than this value. To select these boxes, predictPose uses the selectStrongestBbox function, which uses nonmaximal suppression to eliminate overlapping bounding boxes based on their confidence scores across all classes.

Note

`MinSize` — Minimum size of object-containing region
`[1 1]` (default) | two-element numeric vector

Minimum size of an object-containing region, in pixels, specified as a two-element numeric vector of the form [height width]. MinSize is the size of the smallest object that the trained detector can detect. Specify larger values for this argument to reduce computation time.

`MaxSize` — Maximum size of object-containing region
two-element numeric vector

Maximum size of an of object-containing region, in pixels, specified as a two-element numeric vector of the form [height width].

To reduce computation time, set this value to the known maximum region size for the objects being detected in the image. By default, MaxSize uses the height and width of the input image I.

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

Hardware resource for processing images with the network, specified as "auto", "gpu", or "cpu".

`ExecutionEnvironment`	Description
`"auto"`	Use a GPU if available. Otherwise, use the CPU. The use of GPU requires Parallel Computing Toolbox™ and a CUDA^® enabled NVIDIA^® GPU. For information about the supported compute capabilities, see GPU Computing Requirements (Parallel Computing Toolbox).
`"gpu"`	Use the GPU. If a suitable GPU is not available, the function returns an error message.
`"cpu"`	Use the CPU.

Options for Batch Inputs

collapse all

`MiniBatchSize` — Number of observations in each batch
`1` (default) | positive integer

Number of observations returned in each batch, specified as a positive integer.

Output Arguments

collapse all

`poses` — Object poses
M-by-1 vector of `rigidtform3d` objects | B-by-1 cell array

Object poses, returned as an M-by-1 vector of rigidtform3d objects or a B-by-1 cell array. The value of this output depends on the number of input images B.

Image Type	`poses` Value
Single image	M-by-1 vector of `rigidtform3d` objects. M is the number of objects detected in the image.
Batch of B images	B-by-1 cell array, where each cell contains an M-by-1 vector of `rigidtform3d` objects.

`labels` — Object labels
M-by-1 categorical vector | B-by-1 cell array

Object labels, returned as an M-by-1 categorical vector or a B-by-1 cell array. When I is a single image, labels is an M-by-1 categorical vector, where M is the number of detected objects in the image.

When I is a batch of B images, labels is a B-by-1 cell array in which each cell contains an M-by-1 categorical vector with the labels of the objects detected in the corresponding image.

`scores` — Detection confidence scores
M-by-1 numeric vector | B-by-1 cell array

Detection confidence scores, returned as an M-by-1 numeric vector, or a B-by-1 cell array. When I is a single image, scores is an M-by-1 numeric vector, where M is the number of detected objects in the image.

When I is a batch of B images, then scores is a B-by-1 cell array. Each element is an M-by-1 numeric vector with the scores of the objects detected in the corresponding image.

The score for each object is in the range [0, 1]. A higher score indicates higher confidence in the detection.

`bboxes` — Locations of detected objects
M-by-4 numeric matrix | B-by-1 cell array

Locations of detected objects within the input image, returned as an M-by-4 numeric matrix, or a B-by-1 cell array. When I is a single image, bboxes is an M-by-1 numeric matrix, where M is the number of detected objects in the image. Each row of the matrix is of the form [x y width height], where x and y specify the upper-left corner of the corresponding bounding box, ,and width and height specify its size in pixels.

When I is a batch of B images, bboxes is a B-by-1 cell array in which each cell contains an M-by-4 numeric matrix with the bounding boxes of the objects detected in the corresponding image.

`masks` — Object masks
H-by-W-by-M logical array | B-by-1 cell array

Object masks, returned as a H-by-W-by-M logical array or a B-by-1 cell array. When I is a single image, masks is an H-by-W-by-M logical array, where H and W are the height and width of the input image, respectively, and M is the number of detected objects in the image.

When I is a batch of B images, masks is a B-by-1 cell array in which each cell contains an H-by-W-by-M logical array with the masks for the corresponding image.

Tips

To refine estimated object poses predicted by the network, use point cloud registration techniques to postprocess your results, especially for objects of a symmetric shape. To learn more about refining pose estimation results, see the Perform 6-DoF Pose Estimation for Bin Picking Using Deep Learning example.

Version History

Introduced in R2024a

predictPose

Syntax

Description

Input Arguments

`net` — Pose Mask R-CNN pose estimation network
`posemaskrcnn` object

`I` — Image or batch of images
numeric matrix | numeric array

`depthImage` — Depth map for estimating 3-D pose
numeric matrix | numeric array

`intrinsics` — Camera intrinsic parameters
`cameraIntrinsics` object | B-by-1 cell array of `cameraIntrinsics` objects

Name-Value Arguments

`Threshold` — Object detection threshold
`0.5` (default) | numeric scalar

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

`SelectStrongestMulticlassThreshold` — Strongest bounding box threshold per class
`0.5` (default) | numeric scalar

`SelectStrongestThreshold` — Strongest bounding box threshold per object
`0.3` (default) | numeric scalar

`MinSize` — Minimum size of object-containing region
`[1 1]` (default) | two-element numeric vector

`MaxSize` — Maximum size of object-containing region
two-element numeric vector

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`MiniBatchSize` — Number of observations in each batch
`1` (default) | positive integer

Output Arguments

`poses` — Object poses
M-by-1 vector of `rigidtform3d` objects | B-by-1 cell array

`labels` — Object labels
M-by-1 categorical vector | B-by-1 cell array

`scores` — Detection confidence scores
M-by-1 numeric vector | B-by-1 cell array

`bboxes` — Locations of detected objects
M-by-4 numeric matrix | B-by-1 cell array

`masks` — Object masks
H-by-W-by-M logical array | B-by-1 cell array

Tips

Version History

See Also

Topics

predictPose

Syntax

Description

Input Arguments

net — Pose Mask R-CNN pose estimation network posemaskrcnn object

I — Image or batch of images numeric matrix | numeric array

depthImage — Depth map for estimating 3-D pose numeric matrix | numeric array

intrinsics — Camera intrinsic parameters cameraIntrinsics object | B-by-1 cell array of cameraIntrinsics objects

Name-Value Arguments

Threshold — Object detection threshold 0.5 (default) | numeric scalar

NumStrongestRegions — Maximum number of strongest region proposals 1000 (default) | positive integer | Inf

SelectStrongestMulticlassThreshold — Strongest bounding box threshold per class 0.5 (default) | numeric scalar

SelectStrongestThreshold — Strongest bounding box threshold per object 0.3 (default) | numeric scalar

MinSize — Minimum size of object-containing region [1 1] (default) | two-element numeric vector

MaxSize — Maximum size of object-containing region two-element numeric vector

ExecutionEnvironment — Hardware resource "auto" (default) | "gpu" | "cpu"

MiniBatchSize — Number of observations in each batch 1 (default) | positive integer

Output Arguments

poses — Object poses M-by-1 vector of rigidtform3d objects | B-by-1 cell array

labels — Object labels M-by-1 categorical vector | B-by-1 cell array

scores — Detection confidence scores M-by-1 numeric vector | B-by-1 cell array

bboxes — Locations of detected objects M-by-4 numeric matrix | B-by-1 cell array

masks — Object masks H-by-W-by-M logical array | B-by-1 cell array

Tips

Version History

See Also

Topics

`net` — Pose Mask R-CNN pose estimation network
`posemaskrcnn` object

`I` — Image or batch of images
numeric matrix | numeric array

`depthImage` — Depth map for estimating 3-D pose
numeric matrix | numeric array

`intrinsics` — Camera intrinsic parameters
`cameraIntrinsics` object | B-by-1 cell array of `cameraIntrinsics` objects

`Threshold` — Object detection threshold
`0.5` (default) | numeric scalar

`NumStrongestRegions` — Maximum number of strongest region proposals
`1000` (default) | positive integer | `Inf`

`SelectStrongestMulticlassThreshold` — Strongest bounding box threshold per class
`0.5` (default) | numeric scalar

`SelectStrongestThreshold` — Strongest bounding box threshold per object
`0.3` (default) | numeric scalar

`MinSize` — Minimum size of object-containing region
`[1 1]` (default) | two-element numeric vector

`MaxSize` — Maximum size of object-containing region
two-element numeric vector

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"gpu"` | `"cpu"`

`MiniBatchSize` — Number of observations in each batch
`1` (default) | positive integer

`poses` — Object poses
M-by-1 vector of `rigidtform3d` objects | B-by-1 cell array

`labels` — Object labels
M-by-1 categorical vector | B-by-1 cell array

`scores` — Detection confidence scores
M-by-1 numeric vector | B-by-1 cell array

`bboxes` — Locations of detected objects
M-by-4 numeric matrix | B-by-1 cell array

`masks` — Object masks
H-by-W-by-M logical array | B-by-1 cell array