Why does number of steps exceed max step per episode in RL toolbox training?

Question

mary on 14 Nov 2020

1
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/647788-why-does-number-of-steps-exceed-max-step-per-episode-in-rl-toolbox-training

Answered: Zuzanna Klawikowska on 7 Feb 2024

Hi,

I am trying to simulate the energy mangement of a energy storge in simulink and with RL toolbox. my time step is 1 hour and the time horizon is 24hours. so I have chosen Ts=1 and T=24 in RL toolbox and maxs steps per episode 24 as i only want a value per hour. but as the agent trains in the episode manger i see that the step number for the episode changes for each episode and exceeds 24

Can any body help me with this issue?

1 Comment
Show -1 older commentsHide -1 older comments

Emmanouil Tzorakoleftherakis on 16 Nov 2020

Can you add some reproduction code? That would make it easier to debug

Sign in to comment.

Sign in to answer this question.

Answer 1

Zuzanna Klawikowska on 7 Feb 2024

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/647788-why-does-number-of-steps-exceed-max-step-per-episode-in-rl-toolbox-training#answer_1404421

I have the same problem

My code:

Ts_agent = 60.0/1440.0 %Ts_agent = 0.0417

agentOpts = rlTD3AgentOptions( ...

SampleTime=Ts_agent, ...

DiscountFactor=0.995, ...

ExperienceBufferLength=1000000, ...

MiniBatchSize=100, ...

NumStepsToLookAhead=24, ...

TargetSmoothFactor=0.005, ...

TargetUpdateFrequency=10);

[...]

T = 42

maxepisodes = 500;

maxsteps = ceil(T/Ts_agent) %1008

trainingOptions = rlTrainingOptions(...

MaxEpisodes=maxepisodes,...

MaxStepsPerEpisode=maxsteps);

agent = rlTD3Agent(actor, [critic1,critic2], agentOpts);

trainingStats = train(agent, env, trainingOptions);

Details shown in the RL Episode Manager:

The number of episode steps is above the maximum value. Moreover, the number of steps actually performed, which can be seen in the graph below, is much smaller than the one in the manager. The simulation duration is about 2, instead of the declared 42. After about 40 actions, the simulation starts a new epoch. Interestingly, the simulation worked well for agent with 3 actions, but stopped working when the 4th action was added. The graph shows one of the actions performed by the agent. The sampling time is well interpreted, there are 24 different actions for a simulation time of 1. I cannot understand why the simulation only takes 2 time units instead of 42.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Why does number of steps exceed max step per episode in RL toolbox training?

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

Why does number of steps exceed max step per episode in RL toolbox training?

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments