How to modify actions in experiences during a reinforcement learning training

Question

Ran on 28 Jul 2022

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1769880-how-to-modify-actions-in-experiences-during-a-reinforcement-learning-training

Commented: Ran on 11 Aug 2022

Hi experts

I am doing a reinforcement learning project using reinforcement learning. The formulated problem has a huge discrete action set. So instead of using a Deep Q learning with discrete actions, I turned to DDPG with continuous action space. What I want to do is that after each time I got an action from the actor network, I discretize it to the closest VALID discrete action. Then what I want to store in the experience is not the original continuous action, but the closest discrete action. The DDPG training in Matlab seems to store the original action generated by the actor network plus noise by default. Is there any way to MODIFY the stored action in the experience before it is pushed in the memory buffer? Thanks!

1 Comment
Show -1 older commentsHide -1 older comments

Ran on 29 Jul 2022

@Emmanouil Tzorakoleftherakis

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 29 Jul 2022

1
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1769880-how-to-modify-actions-in-experiences-during-a-reinforcement-learning-training#answer_1017575

If you are working in Simulink, you can use the "Last Action" port in the RL Agent block to indicate what was the action that was actually applied to the environment.

If your environment is in MATLAB, you can either move it to Simulink with a MATLAB Fcn block and follow the above, or you can write your own custom training loop.

7 Comments
Show 5 older commentsHide 5 older comments

Ran on 9 Aug 2022

@Emmanouil Tzorakoleftherakis That makes a lot of sense. One more question that confuses me is that when calculating the observations (which I assume is the next states), reward and isdone, we need to have the current states information. But from the examples provided in Matlab, I don't see any modules that store the current states of the system. Can I use the observation input in the RL agent block or I should create some variables in Environment module to store the current states? Thanks!

Ran on 11 Aug 2022

Hi @Emmanouil Tzorakoleftherakis

I have created a simulink draft as shown below.

I create a function block to discretize my action actually applied to the environment. The environment is another block on the right with output ports including NextObs, reward, and isdone. The "delay" block on the top right corner is to let the environment derive the next observations based on the previous observation. Could you please help check whether the draft makes sense or not?

Particurly, two questions confuse me:

1) As RL needs to derive next states based on the current states, how do the current states are stored in the environment block?

2) I tried to reset the initial state by doing this

function in = localResetFcn(in,N_UAV)
% Initial state: all fully charged with E_Cap, all start from ground, hr is
%
state = [2*ones(1,N_UAV),zeros(1,N_UAV),4]';  %/E_Cap*2 because of input normalization
blk = sprintf('Env_UAVChg/Environment/NextObs');
in = setBlockParameter(in,blk,'InitialCondition',num2str(state));
end

but I got an error: Outport block does not have a parameter named 'InitialCondition'. Could you please advise how to reset the states for each episode? Thanks

Sign in to comment.

How to modify actions in experiences during a reinforcement learning training

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

7 Comments
Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

How to modify actions in experiences during a reinforcement learning training

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

7 Comments Show 5 older commentsHide 5 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

7 Comments
Show 5 older commentsHide 5 older comments