Main Content

rlReplayMemory

Replay memory experience buffer

    Description

    An off-policy reinforcement learning agent stores experiences in an experience buffer. During training, the agent samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators. When you create a custom off-policy reinforcement learning agent, you can create a circular experience buffer using an rlReplayMemory object.

    Creation

    Description

    buffer = rlReplayMemory(obsInfo,actInfo) creates a replay memory experience buffer that is compatible with the observation and action specifications in obsInfo and actInfo, respectively.

    example

    buffer = rlReplayMemory(obsInfo,actInfo,maxLength) sets the maximum length of the buffer by setting the MaxLength property.

    Input Arguments

    expand all

    Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data type, and names of the observation signals.

    You can extract the observation specifications from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

    Action specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data type, and names of the action signals.

    You can extract the action specifications from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

    Properties

    expand all

    This property is read-only.

    Maximum buffer length, specified as a positive integer.

    This property is read-only.

    Number of experiences in buffer, specified as a nonnegative integer.

    Object Functions

    Examples

    collapse all

    Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.

    obsInfo = rlNumericSpec([3 1],...
        LowerLimit=0,...
        UpperLimit=[1;5;10]);

    Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.

    actInfo = rlNumericSpec([2 1],...
        LowerLimit=0,...
        UpperLimit=[5;10]);

    Create an experience buffer with a maximum length of 20,000.

    buffer = rlReplayMemory(obsInfo,actInfo,20000);

    Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.

    For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone value to 0.

    exp.Observation = {obsInfo.UpperLimit.*rand(3,1)};
    exp.Action = {actInfo.UpperLimit.*rand(2,1)};
    exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
    exp.Reward = 10*rand(1);
    exp.IsDone = 0;

    Append the experience to the buffer.

    append(buffer,exp);

    You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.

    for i = 1:100
        expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)};
        expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)};
        expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
        expBatch(i).Reward = 10*rand(1);
        expBatch(i).IsDone = 0;
    end
    expBatch(100).IsDone = 1;
    
    append(buffer,expBatch);

    After appending experiences to the buffer, you can sample mini-batches of experiences for training your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.

    miniBatch = sample(buffer,50);

    You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.

    horizonSample = sample(buffer,1,...
        NStepHorizon=10,...
        DiscountFactor=0.95);

    The returned sample includes the following information.

    • Observation and Action are the observation and action from the first experience in the horizon.

    • NextObservation and IsDone are the next observation and termination signal from the final experience in the horizon.

    • Reward is the cumulative reward across the horizon using the specified discount factor.

    You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.

    sequenceSample = sample(buffer,1,...
        SequenceLength=20);

    Define observation specifications for the environment. For this example, assume that the environment has a two observations channel: one channel with two continuous observations and one channel with a three-valued discrete observation.

    obsContinuous = rlNumericSpec([2 1],...
        LowerLimit=0,...
        UpperLimit=[1;5]);
    obsDiscrete = rlFiniteSetSpec([1 2 3]);
    obsInfo = [obsContinuous obsDiscrete];

    Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.

    actInfo = rlNumericSpec([2 1],...
        LowerLimit=0,...
        UpperLimit=[5;10]);

    Create an experience buffer with a maximum length of 5,000.

    buffer = rlReplayMemory(obsInfo,actInfo,5000);

    Append a sequence of 50 random experiences to the buffer.

    for i = 1:50
        exp(i).Observation = ...
            {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
        exp(i).Action = {actInfo.UpperLimit.*rand(2,1)};
        exp(i).NextObservation = ...
            {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
        exp(i).Reward = 10*rand(1);
        exp(i).IsDone = 0;
    end
    
    append(buffer,exp);

    After appending experiences to the buffer, you can sample mini-batches of experiences for training your RL agent. For example, randomly sample a batch of 10 experiences from the buffer.

    miniBatch = sample(buffer,10);

    Version History

    Introduced in R2022a

    See Also

    |