Clarification on NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency in rlAgentDDPGOptions
- NumEpoch: How does this parameter relate to the overall training process, and how does it affect network updates?
- MaxMiniBatchPerEpoch: How does this limit the number of mini-batches processed during an epoch, and how does it interact with the sampling process?
- LearningFrequency: How does this parameter influence the frequency of updates relative to the agent’s sampling rate?
5 Comments
Hi @Fabian,
Thanks for your questions about those DDPG training options — NumEpoch, MaxMiniBatchPerEpoch, and LearningFrequency. I checked out the MathWorks docs and here’s a quick rundown:
NumEpoch is basically how many times the agent goes over the replay buffer data in one learning step. So more epochs means more network updates per batch.
MaxMiniBatchPerEpoch sets a cap on how many mini-batches the agent processes in each epoch — this helps keep training time and resource use in check.
LearningFrequency tells how often the agent updates its networks relative to the environment’s sampling steps. For example, a value of 4 means the network updates once every 4 steps.
If you want, you can take a look at the official docs here: [ https://www.mathworks.com/help/reinforcement-learning/ref/rlddpgagentoptions.html ]( https://www.mathworks.com/help/reinforcement-learning/ref/rlddpgagentoptions.html )
Let me know if you want me to help with some examples or anything else.
Hope this helps.
Hi @Fabian,
Thank you for your follow-up and for clarifying your definition of a training step as one environment sampling step.
You're absolutely right that `LearningFrequency` controls how many such environment steps must occur before a learning step is executed — for instance, with `LearningFrequency = 4`, the agent performs one learning step every 4 environment interactions.
Your Main Question:
“Do `NumEpoch` and `MaxMiniBatchPerEpoch` relate to what happens during a learning step or a training step?”
Both `NumEpoch` and `MaxMiniBatchPerEpoch` apply during a learning step, not during individual environment sampling steps (which you're referring to as training steps).
`NumEpoch` specifies how many passes are made over the sampled experience data during a single learning step. `MaxMiniBatchPerEpoch` determines how many mini-batches are processed per epoch during that learning step.
In other words:
- Every time a learning step is triggered (based on `LearningFrequency`),
- The agent may perform multiple gradient updates, controlled by these two parameters.
Simulation Script + Plot
To help visualize this, please see MATLAB script below.
%% DDPG Learning Schedule Simulation + Plotting % Author: Umar % Date: 09-13-25
clc; clear;
% ==== Parameters ==== LearningFrequency = 4; NumEpoch = 3; MaxMiniBatchPerEpoch = 2; MiniBatchSize = 32; TotalSteps = 20;
% Simulated Replay Buffer ReplayBuffer = 1:500;
% Tracking metrics learningSteps = []; % Steps where learning happened updateCounts = []; % Count of updates per learning step totalUpdateCount = 0; % Total number of network updates
fprintf('--- DDPG Learning Schedule Simulation ---\n\n');
for step = 1:TotalSteps
fprintf('Environment Step %d\n', step);
% Trigger learning
if mod(step, LearningFrequency) == 0
fprintf(' > Learning Triggered (Step %d)\n', step);
learningSteps(end+1) = step; % Simulate sampling a batch from the replay buffer
batchSize = 256;
largeBatch = datasample(ReplayBuffer, batchSize, 'Replace', false);updatePerStep = 0;
% Epoch loop
for epoch = 1:NumEpoch
fprintf(' Epoch %d/%d\n', epoch, NumEpoch); for mb = 1:MaxMiniBatchPerEpoch
miniBatch = datasample(largeBatch, MiniBatchSize, 'Replace',
false);
fprintf(' Updating with MiniBatch %d/%d (Size: %d)\n', mb,
MaxMiniBatchPerEpoch, MiniBatchSize);
updatePerStep = updatePerStep + 1;
totalUpdateCount = totalUpdateCount + 1;
end
end updateCounts(end+1) = updatePerStep;
end
endfprintf('\n--- Simulation Complete ---\n');
fprintf('Total Environment Steps: %d\n', TotalSteps);
fprintf('Total Learning Steps: %d\n', numel(learningSteps));
fprintf('Total Network Updates: %d\n', totalUpdateCount);
fprintf('Updates per Learning Step: %s\n', mat2str(updateCounts));
%% ==== Plotting ====
figure;
bar(learningSteps, updateCounts, 0.5, 'FaceColor', [0.2 0.6 0.8]);
xlabel('Environment Step');
ylabel('Network Updates');
title('DDPG Learning Triggers and Network Updates');
grid on;
xticks(learningSteps);
ylim([0 max(updateCounts)+1]);
text(learningSteps, updateCounts + 0.2, ...
compose('%d updates', updateCounts), ...
'HorizontalAlignment', 'center', 'FontSize', 9);
This script simulates:
- 20 environment sampling steps
- Learning triggered every 4 steps (`LearningFrequency = 4`)
- Each learning step performs 3 epochs, with 2 mini-batches per epoch
Printed Output (Excerpt)
Total Environment Steps: 20 Total Learning Steps: 5 Total Network Updates: 30 Updates per Learning Step: [6 6 6 6 6]
Plot:
The bar chart shows:
- Which steps triggered learning (steps 4, 8, 12, 16, 20)
- That each learning step performed 6 network updates (`3 epochs × 2 mini-batches`)
This directly illustrates that:
- `LearningFrequency` regulates when learning occurs (based on environment steps)
- `NumEpoch` and `MaxMiniBatchPerEpoch` regulate how much learning happens within each learning step
Please feel free to experiment with the script by adjusting any of the parameters to match your own setup.
Let me know if you'd like me to clarify anything further.
Hi @Fabian, Haha, I’ll happily take that cold beer — thanks! 😄 Glad to hear the explanation helped clarify things. Feel free to reach out anytime if more questions come up.
Answers (0)
Categories
Find more on Reinforcement Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!