How to avoid repeated actions and to manually end episode for a DQN agent?

Question

Matteo Padovani on 10 Mar 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/768487-how-to-avoid-repeated-actions-and-to-manually-end-episode-for-a-dqn-agent

Answered: Emmanouil Tzorakoleftherakis on 18 Mar 2021

I'm using the reinforcement learning toolbox to design and train a DQN agent. At each time step the task of the agent is to select a location on a gridmap where to move in order to perform mapping of the environment. The action space is a discrete action space composed of 24 actions i.e. possible target points.

The goal is to achieve the mapping of the 85% of the environment. The optimal behaviour for the agent would be to select a point nearby, move to that point and then at the next time step repeat this scheme until goal is achieved.

The problem I'm facing is that during training the agent explores different sequencies of actions for each episode, among which there are very good ones. As the agent becomes greedy it performs correctly the first action at the first time step and then it starts to repeat the same action in loop for the next time steps till the end of the episode failing to complete the mission. It seems like it does not learn a sequence of actions as if the algorithm were designed to make the agent achieve its goal with the less number of steps possibile. Am I missing something? Is there some parameters tuning that can improve this behaviour?

Moreover, I would like to ask how can I end an episode: I implemented a custom step function and I've seen that the flag 'IsDone' allows to end the Episode switching it to true but it also means that the agent has reached the target. What if i want to end the episode if the agent performs an action that in reality would end the episode without complete the mission i.e. without setting true the IsDone flag?

The agent is a DQN agent, the critic and the agent parameters are the default ones. The neural network architecture is the dueling DQN architecture from the original paper.

Thanks in advance for your help!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 18 Mar 2021

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/768487-how-to-avoid-repeated-actions-and-to-manually-end-episode-for-a-dqn-agent#answer_650892

From what you are saying, it seems that training has not converged yet. During training, the agent may every now and then behave very well in an episode, but unless this behavior is consistent across multiple back to back episodes (aka average reward), this is not a sign that it has converged. I would try getting the agent to explore more by reducing the epsilon decay rate and epsilon min value. There could be other things going on as well, the most important being that the reward signal does not accurately describe the desired behavior.

For your second question, I don't see how that prevents you from using the IsDone flag. Just put an OR condition and set IsDone to true when target is reached OR when it picks a certain action

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

How to avoid repeated actions and to manually end episode for a DQN agent?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to avoid repeated actions and to manually end episode for a DQN agent?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments