Confusion in Critic network architecture design in DDPG

Question

laha_M on 25 Nov 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/661463-confusion-in-critic-network-architecture-design-in-ddpg

Commented: Maha Mosalam on 8 Dec 2021

Accepted Answer: Emmanouil Tzorakoleftherakis

Open in MATLAB Online

Hello all,

I am trying to implement the following architecture for DDPG agent in MATLAB.

"In our design and implementation, we used a 2-layer fullyconnected feedforward neural network to serve as the actor network, which includes 400 and 300 neurons in the first and second layers respectively, and utilized the ReLU function for activation. In the final output layer, we used tanh(·) as the activation function to bound the actions.

Similarly, for the critic network, we also used a 2-layer fully-connected feedforward neural network with 400 and 300 neurons in the first and second layers respectively, and with ReLU for activation. Besides, we utilized the L2 weight decay to prevent overfitting."

This is taken from a paper.

Now I have implemented the actor in the following way--- (don't bother about the hyperparameters)

actorNetwork = [
    featureInputLayer(numObservations,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(300,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(numActions,'Name','fc3')
    tanhLayer('Name','tanh1')
    scalingLayer('Name','ActorScaling1','Scale',[2.5;0.2618],'Bias',[-0.5;0])];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);

However, I am confused on how to write the code for the Critic according to that paper description. I have done the following.

statePath = [
    featureInputLayer(numObservations,'Normalization','none','Name','observation')
    fullyConnectedLayer(400,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(300,'Name','fc2')
    reluLayer('Name','relu2')
    additionLayer(2,'Name','add')
    fullyConnectedLayer(400,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(300,'Name','fc4')
    reluLayer('Name','relu4')
    fullyConnectedLayer(1,'Name','fc5')];
actionPath = [
    featureInputLayer(numActions,'Normalization','none','Name','action')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
    
%criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
    'Observation',{'observation'},'Action',{'action'},criticOptions);

But I am confused in 'additionLayer' and the 'actionPath'. Is my implementation according to that paper description?

Can anyone suggest?

Thanks.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 25 Nov 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/661463-confusion-in-critic-network-architecture-design-in-ddpg#answer_555938

Open in MATLAB Online

Hello,

Does this paper use DDPG as well? Any images that show the network architecture? If it's another algorithm, the critic may be implemented with a state value network V(s).

DDPG uses Q-network for the critic which needs to take in state and actions (s,a). Reinforcement Learning Toolbox lets you implement this architecture by providing separate input "channels" or paths for the state and the action. That allows you to use different layers in these two paths to extract features more efficiently. See for example image below:

If you want, you can concatenate the observation and action inputs and use a common feature extraction path as follows:

% create a network to be used as underlying critic approximator
statePath = featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'state');
actionPath = featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action');
commonPath = [concatenationLayer(1,2,'Name','concat')
             fullyConnectedLayer(400, 'Name', 'CriticStateFC1')
             reluLayer('Name', 'CriticRelu1')
             fullyConnectedLayer(300, 'Name', 'CriticStateFC2')
             reluLayer('Name','CriticRelu2')
             fullyConnectedLayer(1,'Name','StateValue')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');
plot(criticNetwork)

Hope that helps

5 Comments
Show 3 older commentsHide 3 older comments

laha_M on 29 Nov 2020

Thanks, Emmanouil. And sorry for the late reply. I was trying to implement the architecture the way you suggested.

Yes, the Paper used DDPG. But no network architecture is shown. As mentioned earlier, the description is the only thing provided for the architecture (with some hyperparameter values).

Anyway, I have implemented the architecture. I have also implemented my custom environment.

I have previously asked on the custom environment creation--

https://in.mathworks.com/matlabcentral/answers/619138-custom-rl-environment-creation?s_tid=srchtitle

At the time of that post, I could not validate the environment. I made a few changes to that code. Now the environment is successfully validating.

However, when I am trying to train it with the architecture developed, I am getting errors. The main error is this one--

Output argument "IsDone" (and maybe others) not assigned during call to "MyEnvironment/step"

Even though I have implemented the step and IsDone.

In my environment, one episode has a fixed number of decision epochs (say 100). After which, the environment is reset to initial states, and a new episode starts. I tried to implement it with a persistent variable. However, it seems it's not working.

I am attaching both the codes-- MyEnvironment.m (custom environment creation) and main.m (main ddpg code). Both are fully commented.

The details of my environment are fully described in my previous question.

https://in.mathworks.com/matlabcentral/answers/619138-custom-rl-environment-creation?s_tid=srchtitle

It will be really helpful if you could suggest what I am doing wrong.

Thanks.

laha_M on 30 Nov 2020

Hello Emmanouil,

I tried training the agent, but it's performing quite poorly. I think it may be a problem with the hyperparameter values since I have not tuned anything. Now I have two questions--

I am trying to understand the effects of hyperparameters by reading some resources. But I want to know if there is anything in MATLAB that may help solve this problem other than trial-and-error?
How do I save the best performing agent given I don't know the critical (reward) value? Basically, I want to save the agent that provides maximum reward or, say, top-5 highest rewarding agents?

Thanks.

Maha Mosalam on 8 Dec 2021

is this network mean we make a critic with input (abs+action) , is concatenationLayer means that?

I want to simply make a critic same as actor layers , but with input (obs+action) and output Q fn? ..the aabove network means that

Sign in to comment.

Confusion in Critic network architecture design in DDPG

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments
Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Confusion in Critic network architecture design in DDPG

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

5 Comments Show 3 older commentsHide 3 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

5 Comments
Show 3 older commentsHide 3 older comments