rlIsDoneFunction

Is-done function approximator object for neural network-based environment

Since R2022a

Description

When creating a neural network-based environment using rlNeuralNetworkEnvironment, you can specify the is-done function approximator using an rlIsDoneFunction object. Do so when you do not know a ground-truth termination signal for your environment.

The is-done function approximator object uses a deep neural network as internal approximation model to predict the termination signal for the environment given one of the following input combinations.

Observations, actions, and next observations
Observations and actions
Actions and next observations
Next observations

Creation

Syntax

isdFcnAppx = rlIsDoneFunction(net,observationInfo,actionInfo,Name=Value)

Description

isdFcnAppx = rlIsDoneFunction(net,observationInfo,actionInfo,Name=Value) creates the is-done function approximator object isdFcnAppx using the deep neural network net and sets the ObservationInfo and ActionInfo properties.

When creating an is-done function approximator you must specify the names of the deep neural network inputs using one of the following combinations of name-value pair arguments.

ObservationInputNames, ActionInputNames, and NextObservationInputNames
ObservationInputNames and ActionInputNames
ActionInputNames and NextObservationInputNames
NextObservationInputNames

You can also specify the UseDeterministicPredict and UseDevice properties using optional name-value pair arguments. For example, to use a GPU for prediction, specify UseDevice="gpu".

example

Input Arguments

expand all

`net` — Deep neural network
`dlnetwork` object

Deep neural network with a scalar output value, specified as a dlnetwork object.

The input layer names for this network must match the input names specified using the ObservationInputNames, ActionInputNames, and NextObservationInputNames. The dimensions of the input layers must match the dimensions of the corresponding observation and action specifications in ObservationInfo and ActionInfo, respectively.

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: ObservationInputNames="velocity"

`ObservationInputNames` — Observation input layer names
string | string array

Observation input layer names, specified as a string or string array. Specify ObservationInputNames when you expect the termination signal to depend on the current environment observation.

The number of observation input names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

`ActionInputNames` — Action input layer names
string | string array

Action input layer names, specified as a string or string array. Specify ActionInputNames when you expect the termination signal to depend on the current action value.

The number of action input names must match the length of ActionInfo and the order of the names must match the order of the specifications in ActionInfo.

`NextObservationInputNames` — Next observation input layer names
string | string array

Next observation input layer names, specified as a string or string array. Specify NextObservationInputNames when you expect the termination signal to depend on the next environment observation.

The number of next observation input names must match the length of ObservationInfo and the order of the names must match the order of the specifications in ObservationInfo.

Properties

expand all

`ObservationInfo` — Observation specifications
`rlNumericSpec` object | array of `rlNumericSpec` objects

Observation specifications, specified as an rlNumericSpec object or an array of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

When you create the approximator object, the constructor function sets the ObservationInfo property to the input argument observationInfo.

You can extract observationInfo from an existing environment, function approximator, or agent using getObservationInfo. You can also construct the specifications manually using rlNumericSpec.

Example: [rlNumericSpec([2 1]) rlNumericSpec([1 1])]

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

Action specifications, specified either as an rlFiniteSetSpec (for discrete action spaces) or rlNumericSpec (for continuous action spaces) object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

For this approximator object, only one action channel is allowed.

When you create the approximator object, the constructor function sets the ActionInfo property to the input argument actionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

Example: rlNumericSpec([2 1])

`Normalization` — Normalization method
`"none"` (default) | string array

Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo and actionInfo properties, in that order) is one of the following values:

"none" — Do not normalize the input.
"rescale-zero-one" — Normalize the input by rescaling it to the interval between 0 and 1. The normalized input Y is (U–Min)./(UpperLimit–LowerLimit), where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than 0. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.
"rescale-symmetric" — Normalize the input by rescaling it to the interval between –1 and 1. The normalized input Y is 2(U–LowerLimit)./(UpperLimit–LowerLimit) – 1, where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than –1. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

Note

When you specify the Normalization property of rlAgentInitializationOptions, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both the UpperLimit and LowerLimit properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.

Example: "rescale-symmetric"

`UseDeterministicPredict` — Option to predict the terminal signal deterministically
`true` (default) | `false`

Option to predict the terminal signal deterministically, specified as one of the following values.

true — Use deterministic network prediction.
false — Use stochastic network prediction.

`UseDevice` — Computation device used for training and simulation
`"cpu"` (default) | `"gpu"`

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA^® enabled NVIDIA^® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB^®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. Because of these errors, you can get different results on a GPU and on a CPU for the same operation.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: "gpu"

`Learnables` — Learnable parameters of approximator object
cell array of `dlarray` objects

Learnable parameters of the approximator object, specified as a cell array of dlarray objects. This property contains the learnable parameters of the approximation model used by the approximator object.

Example: {dlarray(rand(256,4)),dlarray(rand(256,1))}

`State` — State of approximator object
cell array of `dlarray` objects

State of the approximator object, specified as a cell array of dlarray objects. For dlnetwork-based models, this property contains the Value column of the State property table of the dlnetwork model. The elements of the cell array are the state of the recurrent neural network used in the approximator (if any), as well as the state for the batch normalization layer (if used).

For model types that are not based on a dlnetwork object, this property is an empty cell array, since these model types do not support states.

Example: {dlarray(rand(256,1)),dlarray(rand(256,1))}

Object Functions

rlNeuralNetworkEnvironment Environment model with deep neural network transition models

Examples

collapse all

Create Is-Done Function and Predict Termination

Open Live Script

Create an environment object and extract observation and action specifications. Alternatively, you can create specifications using rlNumericSpec and rlFiniteSetSpec.

env = rlPredefinedEnv("CartPole-Continuous");
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

To approximate the is-done function, use a deep neural network. The network has one input channel for the next observations. The single output channel is for the predicted termination signal.

Create the neural network as a vector of layer objects.

net = [
    featureInputLayer( ...
                obsInfo.Dimension(1), ...
                Name="nextState")
    fullyConnectedLayer(64)
    reluLayer
    fullyConnectedLayer(64)
    reluLayer
    fullyConnectedLayer(2)
    softmaxLayer(Name="isdone")
    ];

Convert to dlnetwork object.

net = dlnetwork(net);

Plot network.

plot(net)

Figure contains an axes object. The axes object contains an object of type graphplot.

Initialize network and display the number of weights.

net = initialize(net);
summary(net);

   Initialized: true

   Number of learnables: 4.6k

   Inputs:
      1   'nextState'   4 features

Create an is-done function approximator object.

isDoneFcnAppx = rlIsDoneFunction(...
    net,obsInfo,actInfo,...
    NextObservationInputNames="nextState");

Using this is-done function approximator object, you can predict the termination signal based on the next observation. For example, predict the termination signal for a random next observation. Since for this example the termination signal only depends on the next observation, use empty cell arrays for the current action and observation inputs.

nxtobs = rand(obsInfo.Dimension);
predIsDone = predict(isDoneFcnAppx,{},{},{nxtobs})

predIsDone = 
0

You can obtain the termination probability using evaluate.

predIsDoneProb = evaluate(isDoneFcnAppx,{nxtobs})

predIsDoneProb = 1×1 cell array
    {2×1 single}

predIsDoneProb{1}

ans = 2×1 single column vector

    0.5405
    0.4595

The first number is the probability of obtaining a 0 (no termination predicted), the second one is the probability of obtaining a 1 (termination predicted).

Version History

Introduced in R2022a

rlIsDoneFunction

Description

Creation

Syntax

Description

Input Arguments

`net` — Deep neural network
`dlnetwork` object

Name-Value Arguments

`ObservationInputNames` — Observation input layer names
string | string array

`ActionInputNames` — Action input layer names
string | string array

`NextObservationInputNames` — Next observation input layer names
string | string array

Properties

`ObservationInfo` — Observation specifications
`rlNumericSpec` object | array of `rlNumericSpec` objects

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

`Normalization` — Normalization method
`"none"` (default) | string array

`UseDeterministicPredict` — Option to predict the terminal signal deterministically
`true` (default) | `false`

`UseDevice` — Computation device used for training and simulation
`"cpu"` (default) | `"gpu"`

`Learnables` — Learnable parameters of approximator object
cell array of `dlarray` objects

`State` — State of approximator object
cell array of `dlarray` objects

Object Functions

Examples

Create Is-Done Function and Predict Termination

Version History

See Also

Functions

Objects

Topics

rlIsDoneFunction

Description

Creation

Syntax

Description

Input Arguments

net — Deep neural network dlnetwork object

Name-Value Arguments

ObservationInputNames — Observation input layer names string | string array

ActionInputNames — Action input layer names string | string array

NextObservationInputNames — Next observation input layer names string | string array

Properties

ObservationInfo — Observation specifications rlNumericSpec object | array of rlNumericSpec objects

ActionInfo — Action specifications rlFiniteSetSpec object | rlNumericSpec object

Normalization — Normalization method "none" (default) | string array

UseDeterministicPredict — Option to predict the terminal signal deterministically true (default) | false

UseDevice — Computation device used for training and simulation "cpu" (default) | "gpu"

Learnables — Learnable parameters of approximator object cell array of dlarray objects

State — State of approximator object cell array of dlarray objects

Object Functions

Examples

Create Is-Done Function and Predict Termination

Version History

See Also

Functions

Objects

Topics

`net` — Deep neural network
`dlnetwork` object

`ObservationInputNames` — Observation input layer names
string | string array

`ActionInputNames` — Action input layer names
string | string array

`NextObservationInputNames` — Next observation input layer names
string | string array

`ObservationInfo` — Observation specifications
`rlNumericSpec` object | array of `rlNumericSpec` objects

`ActionInfo` — Action specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object

`Normalization` — Normalization method
`"none"` (default) | string array

`UseDeterministicPredict` — Option to predict the terminal signal deterministically
`true` (default) | `false`

`UseDevice` — Computation device used for training and simulation
`"cpu"` (default) | `"gpu"`

`Learnables` — Learnable parameters of approximator object
cell array of `dlarray` objects

`State` — State of approximator object
cell array of `dlarray` objects