rlHybridStochasticActorPolicy

Policy object to generate hybrid stochastic actions for custom training loops and application deployment

Since R2024b

Description

This object implements a hybrid stochastic policy, which returns stochastic hybrid actions given an input observation, according to two (one discrete and one continuous) probability distributions. You can create an rlStochasticActorPolicy object from an rlHybridStochasticActor, or extract it from an rlSACAgent with a hybrid action space. You can then train the policy object using a custom training loop or deploy it for your application using generatePolicyBlock or generatePolicyFunction. If UseMaxLikelihoodAction is set to 1, the policy is deterministic and therefore it does not explore. For more information on policies and value functions, see Create Policies and Value Functions.

Creation

Syntax

policy = rlHybridStochasticActorPolicy(actor)

Description

policy = rlHybridStochasticActorPolicy(actor) creates the hybrid stochastic policy object policy from the rlHybridStochasticActor actor actor. It also sets the Actor property of policy to the input argument actor.

example

Properties

expand all

`Actor` — Actor
`rlHybridStochasticActor` object

Actor, specified as an rlHybridStochasticActor object.

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

Option to enable maximum likelihood action, specified as a logical value:

false — The action is sampled from the probability distribution, which helps exploration.
true — The action is always the maximum likelihood action. In this case the policy is deterministic and therefore there is no exploration.

Example: true

`Normalization` — Normalization method
`"none"` (default) | string array

Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo and actionInfo properties, in that order) is one of the following values:

"none" — Do not normalize the input.
"rescale-zero-one" — Normalize the input by rescaling it to the interval between 0 and 1. The normalized input Y is (U–Min)./(UpperLimit–LowerLimit), where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than 0. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.
"rescale-symmetric" — Normalize the input by rescaling it to the interval between –1 and 1. The normalized input Y is 2(U–LowerLimit)./(UpperLimit–LowerLimit) – 1, where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than –1. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

Note

When you specify the Normalization property of rlAgentInitializationOptions, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both the UpperLimit and LowerLimit properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.

Example: "rescale-symmetric"

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

Observation specifications, returned as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

`ActionInfo` — Action specifications
vector

Action specification, returned as a vector consisting of one rlFiniteSetSpec object followed by one rlNumericSpec object. The action specification defines the properties of an environment action channel, such as its dimensions, data type, and name.

Note

Hybrid action spaces require two channels because the discrete and continuous parts of the action must have their own channels.

`SampleTime` — Sample time of policy
`1` (default) | positive scalar | `-1`

Sample time of the policy, specified as a positive scalar or as -1.

Within a MATLAB^® environment, the policy is executed every time you call it within your custom training loop, so, SampleTime does not affect the timing of the policy execution.

Within a Simulink^® environment, the Policy block that uses the policy object executes every SampleTime seconds of simulation time. If SampleTime is -1 the block inherits the sample time from its input signals. Set SampleTime to -1 when the block is a child of an event-driven subsystem.

Note

Set SampleTime to a positive scalar when the block is not a child of an event-driven subsystem. Doing so ensures that the block executes at appropriate intervals when input signal sample times change due to model variations.

If SampleTime is a positive scalar, this value is also the time interval between consecutive elements in the output experience returned by sim, regardless of the type of environment.

If SampleTime is -1, for Simulink environments, the time interval between consecutive elements in the returned output experience reflects the timing of the events that trigger the Policy block execution, while for MATLAB environments, this time interval is considered equal to 1.

Example: SampleTime=-1

Object Functions

`generatePolicyBlock`	Generate Simulink block that evaluates policy of an agent or policy object
`generatePolicyFunction`	Generate MATLAB function that evaluates policy of an agent or policy object
`getAction`	Obtain action from agent, actor, or policy object given environment observations
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`reset`	Reset environment, agent, experience buffer, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object

Examples

collapse all

Create Hybrid Stochastic Actor Policy from Hybrid Stochastic Actor

Open Live Script

Create observation and action specification objects. For this example, define a continuous four-dimensional observation space, and a hybrid action space consisting of a single discrete scalar that can be -5, 0, or 5, and a continuous column vector containing two doubles each between -4 and 4.

obsInfo = rlNumericSpec([4 1]);
actInfo = [ 
    rlFiniteSetSpec([-5,0,5]) 
    rlNumericSpec([2 1], ...
    LowerLimit=-4, ...
    UpperLimit=4) ];

Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment.

Create a default hybrid SAC agent use the observation and action specifications.

agent = rlSACAgent(obsInfo,actInfo);

Extract the hybrid stochastic actor from the agent.

actor = getActor(agent);

Create an rlHybridStochasticActorPolicy object from the actor.

policy = rlHybridStochasticActorPolicy(actor)

policy = 
  rlHybridStochasticActorPolicy with properties:

                     Actor: [1×1 rl.function.rlHybridStochasticActor]
    UseMaxLikelihoodAction: 0
             Normalization: "none"
           ObservationInfo: [1×1 rl.util.rlNumericSpec]
                ActionInfo: [2×1 rl.util.RLDataSpec]
                SampleTime: -1

Check the policy with a random observation input.

act = getAction(policy,{rand(obsInfo.Dimension)});

Display the discrete and continuous parts of the action.

act{1}

ans = 
5

act{2}

ans = 2×1

   -2.1120
    2.1320

You can now train the policy with a custom training loop and then deploy it to your application.

Version History

Introduced in R2024b

rlHybridStochasticActorPolicy

Description

Creation

Syntax

Description

Properties

`Actor` — Actor
`rlHybridStochasticActor` object

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

`Normalization` — Normalization method
`"none"` (default) | string array

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
vector

`SampleTime` — Sample time of policy
`1` (default) | positive scalar | `-1`

Object Functions

Examples

Create Hybrid Stochastic Actor Policy from Hybrid Stochastic Actor

Version History

See Also

Functions

Objects

Blocks

Topics

rlHybridStochasticActorPolicy

Description

Creation

Syntax

Description

Properties

Actor — Actor rlHybridStochasticActor object

UseMaxLikelihoodAction — Option to enable maximum likelihood action false (default) | true

Normalization — Normalization method "none" (default) | string array

ObservationInfo — Observation specifications rlFiniteSetSpec object | rlNumericSpec object | array

ActionInfo — Action specifications vector

SampleTime — Sample time of policy 1 (default) | positive scalar | -1

Object Functions

Examples

Create Hybrid Stochastic Actor Policy from Hybrid Stochastic Actor

Version History

See Also

Functions

Objects

Blocks

Topics

`Actor` — Actor
`rlHybridStochasticActor` object

`UseMaxLikelihoodAction` — Option to enable maximum likelihood action
`false` (default) | `true`

`Normalization` — Normalization method
`"none"` (default) | string array

`ObservationInfo` — Observation specifications
`rlFiniteSetSpec` object | `rlNumericSpec` object | array

`ActionInfo` — Action specifications
vector

`SampleTime` — Sample time of policy
`1` (default) | positive scalar | `-1`