rlReplayMemory
Description
An off-policy reinforcement learning agent stores experiences in a circular experience buffer.
During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:
S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.
D is the is-done signal after taking action A.
The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.
By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an
rlReplayMemory
object as their experience buffer. Agents uniformly
sample data from this buffer.
You can replace the default experience buffer using one of the following alternative buffer objects.
rlPrioritizedReplayMemory
— Prioritized nonuniform sampling of experiencesrlHindsightReplayMemory
— Uniform sampling of experiences and generation of hindsight experiences by replacing goals with goal measurementsrlHindsightPrioritizedReplayMemory
— Prioritized nonuniform sampling of experiences and generation of hindsight experiences
When you create a custom off-policy reinforcement learning agent, you can create an
experience buffer using an rlReplayMemory
object.
Creation
Description
Input Arguments
Properties
Object Functions
append | Append experiences to replay memory buffer |
sample | Sample experiences from replay memory buffer |
resize | Resize replay memory experience buffer |
reset | Reset environment, agent, experience buffer, or policy object |
allExperiences | Return all experiences in replay memory buffer |
validateExperience | Validate experiences for replay memory |
getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer |
getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer |
Examples
Version History
Introduced in R2022a