Main Content


Create and configure reinforcement learning agents

A reinforcement learning agent receives observations and a reward from the environment, and returns an action to the environment. During training, the agent continuously updates its parameters to improve its policy for the given environment.

Reinforcement Learning Toolbox™ software provides built-in reinforcement learning agents that use several common algorithms, such as Q-Learning, DQN, PG, AC, DDPG, TD3, SAC and PPO. You can also implement your own custom agents.

For an introduction to agents, see Reinforcement Learning Agents. For an introduction to policies, value functions, actors and critics, see Create Policies and Value Functions.


Reinforcement Learning DesignerDesign, train, and simulate reinforcement learning agents (Since R2021a)


RL AgentReinforcement learning agent


expand all

rlQAgentQ-learning reinforcement learning agent
rlSARSAAgentSARSA reinforcement learning agent
rlDQNAgentDeep Q-network (DQN) reinforcement learning agent
rlPGAgentPolicy gradient (PG) reinforcement learning agent
rlACAgentActor-critic (AC) reinforcement learning agent
rlPPOAgentProximal policy optimization (PPO) reinforcement learning agent (Since R2019b)
rlTRPOAgentTrust region policy optimization (TRPO) reinforcement learning agent (Since R2021b)
rlDDPGAgentDeep deterministic policy gradient (DDPG) reinforcement learning agent
rlTD3AgentTwin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent (Since R2020a)
rlSACAgentSoft actor-critic (SAC) reinforcement learning agent (Since R2020b)
rlQAgentOptionsOptions for Q-learning agent
rlSARSAAgentOptionsOptions for SARSA agent
rlDQNAgentOptionsOptions for DQN agent
rlPGAgentOptionsOptions for PG agent
rlACAgentOptionsOptions for AC agent
rlPPOAgentOptionsOptions for PPO agent (Since R2019b)
rlTRPOAgentOptionsOptions for TRPO agent (Since R2021b)
rlDDPGAgentOptionsOptions for DDPG agent
rlTD3AgentOptionsOptions for TD3 agent (Since R2020a)
rlSACAgentOptionsOptions for SAC agent (Since R2020b)
rlAgentInitializationOptionsOptions for initializing reinforcement learning agents (Since R2020b)
rlConservativeQLearningOptionsRegularizer options object to train DQN and SAC agents (Since R2023a)
rlBehaviorCloningRegularizerOptionsRegularizer options object to train DDPG, TD3 and SAC agents (Since R2023a)
rlMBPOAgentModel-based policy optimization (MBPO) reinforcement learning agent (Since R2022a)
rlMBPOAgentOptionsOptions for MBPO agent (Since R2022a)
getActorExtract actor from reinforcement learning agent
getCriticExtract critic from reinforcement learning agent
setActorSet actor of reinforcement learning agent
setCriticSet critic of reinforcement learning agent
getActionObtain action from agent, actor, or policy object given environment observations (Since R2020a)
rlReplayMemoryReplay memory experience buffer (Since R2022a)
rlPrioritizedReplayMemoryReplay memory experience buffer with prioritized sampling (Since R2022b)
rlHindsightReplayMemoryHindsight replay memory experience buffer (Since R2023a)
rlHindsightPrioritizedReplayMemoryHindsight replay memory experience buffer with prioritized sampling (Since R2023a)
appendAppend experiences to replay memory buffer (Since R2022a)
sampleSample experiences from replay memory buffer (Since R2022a)
resizeResize replay memory experience buffer (Since R2022b)
allExperiencesReturn all experiences in replay memory buffer (Since R2022b)
validateExperienceValidate experiences for replay memory (Since R2023a)
generateHindsightExperiencesGenerate hindsight experiences from hindsight experience replay buffer (Since R2023a)
getActionInfoObtain action data specifications from reinforcement learning environment, agent, or experience buffer
getObservationInfoObtain observation data specifications from reinforcement learning environment, agent, or experience buffer
resetReset environment, agent, experience buffer, or policy object (Since R2022a)


Agent Basics

Agent Types

Custom Agents