Why does number of steps exceed max step per episode in RL toolbox training?

10 views (last 30 days)
Hi,
I am trying to simulate the energy mangement of a energy storge in simulink and with RL toolbox. my time step is 1 hour and the time horizon is 24hours. so I have chosen Ts=1 and T=24 in RL toolbox and maxs steps per episode 24 as i only want a value per hour. but as the agent trains in the episode manger i see that the step number for the episode changes for each episode and exceeds 24
Can any body help me with this issue?

Answers (1)

Zuzanna Klawikowska
Zuzanna Klawikowska on 7 Feb 2024
I have the same problem
My code:
Ts_agent = 60.0/1440.0 %Ts_agent = 0.0417
agentOpts = rlTD3AgentOptions( ...
SampleTime=Ts_agent, ...
DiscountFactor=0.995, ...
ExperienceBufferLength=1000000, ...
MiniBatchSize=100, ...
NumStepsToLookAhead=24, ...
TargetSmoothFactor=0.005, ...
TargetUpdateFrequency=10);
[...]
T = 42
maxepisodes = 500;
maxsteps = ceil(T/Ts_agent) %1008
trainingOptions = rlTrainingOptions(...
MaxEpisodes=maxepisodes,...
MaxStepsPerEpisode=maxsteps);
agent = rlTD3Agent(actor, [critic1,critic2], agentOpts);
trainingStats = train(agent, env, trainingOptions);
Details shown in the RL Episode Manager:
The number of episode steps is above the maximum value. Moreover, the number of steps actually performed, which can be seen in the graph below, is much smaller than the one in the manager. The simulation duration is about 2, instead of the declared 42. After about 40 actions, the simulation starts a new epoch. Interestingly, the simulation worked well for agent with 3 actions, but stopped working when the 4th action was added. The graph shows one of the actions performed by the agent. The sampling time is well interpreted, there are 24 different actions for a simulation time of 1. I cannot understand why the simulation only takes 2 time units instead of 42.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!