rlPrioritizedReplayMemory
Description
An off-policy reinforcement learning agent stores experiences in a circular experience buffer.
During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:
- S is the current observation of the environment. 
- A is the action taken by the agent. 
- R is the reward for taking action A. 
- S' is the next observation after taking action A. 
- D is the is-done signal after taking action A. 
The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.
By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory object
      as their experience buffer. Agents uniformly sample data from this buffer. To perform
      nonuniform prioritized sampling [1], which can improve sample
      efficiency when training your agent, use an rlPrioritizedReplayMemory object.
      For more information on prioritized sampling, see Algorithms.
For goal-conditioned tasks, you can also replace your experience buffer with one of the following hindsight replay memory objects.
- rlHindsightReplayMemory— Uniform sampling of experiences and generation of hindsight experiences by replacing goals with goal measurements
- rlHindsightPrioritizedReplayMemory— Prioritized nonuniform sampling of experiences and generation of hindsight experiences
Creation
Syntax
Description
Input Arguments
Properties
Object Functions
| append | Append experiences to replay memory buffer | 
| sample | Sample experiences from replay memory buffer | 
| resize | Resize replay memory experience buffer | 
| reset | Reset environment, agent, experience buffer, or policy object | 
| allExperiences | Return all experiences in replay memory buffer | 
| validateExperience | Validate experiences for replay memory | 
| getActionInfo | Obtain action data specifications from reinforcement learning environment, agent, or experience buffer | 
| getObservationInfo | Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer | 
Examples
Limitations
- Prioritized experience replay does not support agents that use recurrent neural networks. 
Algorithms
References
[1] Schaul, Tom, John Quan, Ioannis Antonoglou, and David Silver. 'Prioritized experience replay'. arXiv:1511.05952 [Cs] 25 February 2016. https://arxiv.org/abs/1511.05952.
Version History
Introduced in R2022b