rlQValueRepresentation
(Not recommended) Q-Value function critic representation for reinforcement learning agents
Since R2020a
rlQValueRepresentation
is not recommended. Use rlQValueFunction
or
rlVectorQValueFunction
instead. For more information, see rlQValueRepresentation is not recommended.
Description
This object implements a Q-value function approximator to be used as a critic
within a reinforcement learning agent. A Q-value function is a function that maps an
observation-action pair to a scalar value representing the expected total long-term rewards
that the agent is expected to accumulate when it starts from the given observation and
executes the given action. Q-value function critics therefore need both observations and
actions as inputs. After you create an rlQValueRepresentation
critic, use it
to create an agent relying on a Q-value function critic, such as an rlQAgent
, rlDQNAgent
, rlSARSAAgent
, rlDDPGAgent
, or rlTD3Agent
. For more
information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
Scalar Output Q-Value Critic
creates the Q-value function critic
= rlQValueRepresentation(net
,observationInfo
,actionInfo
,'Observation',obsName
,'Action',actName
)critic
. net
is
the deep neural network used as an approximator, and must have both observations and
action as inputs, and a single scalar output. This syntax sets the ObservationInfo
and ActionInfo
properties of critic
respectively to the inputs
observationInfo
and actionInfo
, containing
the observations and action specifications. obsName
must contain
the names of the input layers of net
that are associated with the
observation specifications. The action name actName
must be the
name of the input layer of net
that is associated with the action
specifications.
creates the Q-value function based critic
= rlQValueRepresentation(tab
,observationInfo
,actionInfo
)critic
with discrete
action and observation spaces from the Q-value table
tab
. tab
is a rlTable
object
containing a table with as many rows as the possible observations and as many columns as
the possible actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic
respectively to the inputs
observationInfo
and actionInfo
, which must
be rlFiniteSetSpec
objects containing the specifications for the discrete observations and action spaces,
respectively.
creates a Q-value function based critic
= rlQValueRepresentation({basisFcn
,W0
},observationInfo
,actionInfo
)critic
using a custom basis
function as underlying approximator. The first input argument is a two-elements cell in
which the first element contains the handle basisFcn
to a custom
basis function, and the second element contains the initial weight vector
W0
. Here the basis function must have both observations and
action as inputs and W0
must be a column vector. This syntax sets
the ObservationInfo
and ActionInfo
properties of critic
respectively to the inputs
observationInfo
and actionInfo
.
Multi-Output Discrete Action Space Q-Value Critic
creates the multi-output Q-value function
critic
= rlQValueRepresentation(net
,observationInfo
,actionInfo
,'Observation',obsName
)critic
for a discrete action space. net
is the deep
neural network used as an approximator, and must have only the observations as input and
a single output layer having as many elements as the number of possible discrete
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic
respectively to the inputs
observationInfo
and actionInfo
, containing
the observations and action specifications. Here, actionInfo
must
be an rlFiniteSetSpec
object containing the specifications for the discrete action space. The observation
names obsName
must be the names of the input layers of
net
.
creates the multi-output Q-value function
critic
= rlQValueRepresentation({basisFcn
,W0
},observationInfo
,actionInfo
)critic
for a discrete action space using a custom basis function as
underlying approximator. The first input argument is a two-elements cell in which the
first element contains the handle basisFcn
to a custom basis
function, and the second element contains the initial weight matrix
W0
. Here the basis function must have only the observations as
inputs, and W0
must have as many columns as the number of possible
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of critic
respectively to the inputs
observationInfo
and actionInfo
.
Options
creates the value function based critic
= rlQValueRepresentation(___,options
)critic
using the additional option
set options
, which is an rlRepresentationOptions
object. This syntax sets the Options
property of critic
to the
options
input argument. You can use this syntax with any of the
previous input-argument combinations.
Input Arguments
Properties
Object Functions
rlDDPGAgent | Deep deterministic policy gradient (DDPG) reinforcement learning agent |
rlTD3Agent | Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent |
rlDQNAgent | Deep Q-network (DQN) reinforcement learning agent |
rlQAgent | Q-learning reinforcement learning agent |
rlSARSAAgent | SARSA reinforcement learning agent |
rlSACAgent | Soft actor-critic (SAC) reinforcement learning agent |
getValue | Obtain estimated value from a critic given environment observations and actions |
getMaxQValue | Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations |