Clarification on NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency in rlAgentDDPGOptions

Hello,
I'm currently working with the rlAgentDDPGOptions and while I have a solid understanding of most of the configuration parameters, I am having some trouble understanding three specific options: NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency.
My main confusion revolves around how and when the networks are updated, considering the agent's sampling times. Specifically:
  • NumEpoch: How does this parameter relate to the overall training process, and how does it affect network updates?
  • MaxMiniBatchPerEpoch: How does this limit the number of mini-batches processed during an epoch, and how does it interact with the sampling process?
  • LearningFrequency: How does this parameter influence the frequency of updates relative to the agent’s sampling rate?
Any clarification on these points would be greatly appreciated!
Thank you in advance for your help!
Best regards,
Fabián.

5 Comments

Hi @Fabian,

Thanks for your questions about those DDPG training options — NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency. I checked out the MathWorks docs and here’s a quick rundown:

NumEpoch is basically how many times the agent goes over the replay buffer data in one learning step. So more epochs means more network updates per batch.

MaxMiniBatchPerEpoch sets a cap on how many mini-batches the agent processes in each epoch — this helps keep training time and resource use in check.

LearningFrequency tells how often the agent updates its networks relative to the environment’s sampling steps. For example, a value of 4 means the network updates once every 4 steps.

If you want, you can take a look at the official docs here: [ https://www.mathworks.com/help/reinforcement-learning/ref/rlddpgagentoptions.html ]( https://www.mathworks.com/help/reinforcement-learning/ref/rlddpgagentoptions.html )

Let me know if you want me to help with some examples or anything else.

Hope this helps.

Hi @Umar,
Thank you for your response. From it, I understand that LearningFrequency refers to how many training steps must pass for the networks to update, or in other words, how many steps must occur for a learning step to be executed. However, I am still unclear about the function of NumEpoch and MaxMiniBatchPerEpoch. Do these parameters relate to what happens during a learning step or a training step? Just to clarify, by training step I mean the environment's sampling steps.

Hi @Fabian,

Thank you for your follow-up and for clarifying your definition of a training step as one environment sampling step.

You're absolutely right that `LearningFrequency` controls how many such environment steps must occur before a learning step is executed — for instance, with `LearningFrequency = 4`, the agent performs one learning step every 4 environment interactions.

Your Main Question:

“Do `NumEpoch` and `MaxMiniBatchPerEpoch` relate to what happens during a learning step or a training step?”

Both `NumEpoch` and `MaxMiniBatchPerEpoch` apply during a learning step, not during individual environment sampling steps (which you're referring to as training steps).

`NumEpoch` specifies how many passes are made over the sampled experience data during a single learning step. `MaxMiniBatchPerEpoch` determines how many mini-batches are processed per epoch during that learning step.

In other words:

  • Every time a learning step is triggered (based on `LearningFrequency`),
  • The agent may perform multiple gradient updates, controlled by these two parameters.

Simulation Script + Plot

To help visualize this, please see MATLAB script below.

%% DDPG Learning Schedule Simulation + Plotting
% Author: Umar
% Date: 09-13-25
clc; clear;
% ==== Parameters ====
LearningFrequency = 4;
NumEpoch = 3;
MaxMiniBatchPerEpoch = 2;
MiniBatchSize = 32;
TotalSteps = 20;
% Simulated Replay Buffer
ReplayBuffer = 1:500;
% Tracking metrics
learningSteps = [];          % Steps where learning happened
updateCounts = [];           % Count of updates per learning step
totalUpdateCount = 0;        % Total number of network updates
fprintf('--- DDPG Learning Schedule Simulation ---\n\n');
for step = 1:TotalSteps
  fprintf('Environment Step %d\n', step);
    % Trigger learning
    if mod(step, LearningFrequency) == 0
        fprintf('  > Learning Triggered (Step %d)\n', step);
        learningSteps(end+1) = step;
        % Simulate sampling a batch from the replay buffer
        batchSize = 256;
        largeBatch = datasample(ReplayBuffer, batchSize, 'Replace', false);
        updatePerStep = 0;
        % Epoch loop
        for epoch = 1:NumEpoch
            fprintf('    Epoch %d/%d\n', epoch, NumEpoch);
            for mb = 1:MaxMiniBatchPerEpoch
                miniBatch = datasample(largeBatch, MiniBatchSize, 'Replace', 
                false);
                fprintf('      Updating with MiniBatch %d/%d (Size: %d)\n', mb, 
                MaxMiniBatchPerEpoch, MiniBatchSize);
                updatePerStep = updatePerStep + 1;
                totalUpdateCount = totalUpdateCount + 1;
            end
        end
        updateCounts(end+1) = updatePerStep;
    end
  end
fprintf('\n--- Simulation Complete ---\n');
fprintf('Total Environment Steps: %d\n', TotalSteps);
fprintf('Total Learning Steps: %d\n', numel(learningSteps));
fprintf('Total Network Updates: %d\n', totalUpdateCount);
fprintf('Updates per Learning Step: %s\n', mat2str(updateCounts));
%% ==== Plotting ====
figure;
bar(learningSteps, updateCounts, 0.5, 'FaceColor', [0.2 0.6 0.8]);
xlabel('Environment Step');
ylabel('Network Updates');
title('DDPG Learning Triggers and Network Updates');
grid on;
xticks(learningSteps);
ylim([0 max(updateCounts)+1]);
text(learningSteps, updateCounts + 0.2, ...
   compose('%d updates', updateCounts), ...
   'HorizontalAlignment', 'center', 'FontSize', 9);

This script simulates:

  • 20 environment sampling steps
  • Learning triggered every 4 steps (`LearningFrequency = 4`)
  • Each learning step performs 3 epochs, with 2 mini-batches per epoch

Printed Output (Excerpt)

Total Environment Steps: 20 Total Learning Steps: 5 Total Network Updates: 30 Updates per Learning Step: [6 6 6 6 6]

Plot:

The bar chart shows:

  • Which steps triggered learning (steps 4, 8, 12, 16, 20)
  • That each learning step performed 6 network updates (`3 epochs × 2 mini-batches`)

This directly illustrates that:

  • `LearningFrequency` regulates when learning occurs (based on environment steps)
  • `NumEpoch` and `MaxMiniBatchPerEpoch` regulate how much learning happens within each learning step

Please feel free to experiment with the script by adjusting any of the parameters to match your own setup.

Let me know if you'd like me to clarify anything further.

Hi @Umar,
Thank you so much for your response. With your clarifications, I fully understand the implementation of the DDPG algorithm in MATLAB. You deserve a cold beer.
Best regards!
Fabián.

Hi @Fabian, Haha, I’ll happily take that cold beer — thanks! 😄 Glad to hear the explanation helped clarify things. Feel free to reach out anytime if more questions come up.

Sign in to comment.

Answers (0)

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Products

Release

R2024b

Asked:

on 13 Sep 2025

Commented:

on 14 Sep 2025

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!