Agent repeats same sequence of actions each episode
Show older comments
Can someone please help me understand why my RL Agent is outputting the same sequence of actions each episode, regardless of the observations made from the environment. Here is an example of what I mean:
prev_state = 11.20 11.90 11.30 11.50
action = 0.00 0.00 0.00 0.00
new_state = 11.20 11.90 11.30 11.50
prev_state = 11.20 11.90 11.30 11.50
action = 0.10 0.10 -0.10 0.00
new_state = 11.30 12.00 11.20 11.50
prev_state = 11.30 12.00 11.20 11.50
action = 0.10 0.10 -0.10 0.00
new_state = 11.40 12.00 11.10 11.50
prev_state = 11.40 12.00 11.10 11.50
action = -0.10 -0.10 0.10 0.00
new_state = 11.30 11.90 11.20 11.50
prev_state = 11.30 11.90 11.20 11.50
action = 0.00 0.00 0.10 0.10
new_state = 11.30 11.90 11.30 11.60
Episode: 1/ 2 | Episode Reward : -5.00 | Episode Steps: 5 | Avg Reward : -5.00 | Step Count : 5 | Episode Q0 : 1.03
prev_state = 12.00 11.20 11.70 11.50
action = 0.00 0.00 0.00 0.00
new_state = 12.00 11.20 11.70 11.50
prev_state = 12.00 11.20 11.70 11.50
action = 0.10 0.10 -0.10 0.00
new_state = 12.00 11.30 11.60 11.50
prev_state = 12.00 11.30 11.60 11.50
action = 0.10 0.10 -0.10 0.00
new_state = 12.00 11.40 11.50 11.50
prev_state = 12.00 11.40 11.50 11.50
action = -0.10 -0.10 0.10 0.00
new_state = 11.90 11.30 11.60 11.50
prev_state = 11.90 11.30 11.60 11.50
action = 0.00 0.00 0.10 0.10
new_state = 11.90 11.30 11.70 11.60
Episode: 2/ 2 | Episode Reward : -5.00 | Episode Steps: 5 | Avg Reward : -5.00 | Step Count : 10 | Episode Q0 : 1.04
Let me know if you have any questions about the simulation.
More info on the simulation & my other issues: https://www.mathworks.com/matlabcentral/answers/555799-reinforcement-learning-sample-time
Accepted Answer
More Answers (0)
Categories
Find more on Training and Simulation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!