MATLAB Answers

Confusion in Critic network architecture design in DDPG

8 views (last 30 days)
Hello all,
I am trying to implement the following architecture for DDPG agent in MATLAB.
"In our design and implementation, we used a 2-layer fullyconnected feedforward neural network to serve as the actor network, which includes 400 and 300 neurons in the first and second layers respectively, and utilized the ReLU function for activation. In the final output layer, we used tanh(·) as the activation function to bound the actions.
Similarly, for the critic network, we also used a 2-layer fully-connected feedforward neural network with 400 and 300 neurons in the first and second layers respectively, and with ReLU for activation. Besides, we utilized the L2 weight decay to prevent overfitting."
This is taken from a paper.
Now I have implemented the actor in the following way--- (don't bother about the hyperparameters)
actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(300,'Name','fc2')
reluLayer('Name','relu2')
fullyConnectedLayer(numActions,'Name','fc3')
tanhLayer('Name','tanh1')
scalingLayer('Name','ActorScaling1','Scale',[2.5;0.2618],'Bias',[-0.5;0])];
actorOptions = rlRepresentationOptions('LearnRate',1e-4,'GradientThreshold',1,'L2RegularizationFactor',1e-4);
actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
'Observation',{'observation'},'Action',{'ActorScaling1'},actorOptions);
However, I am confused on how to write the code for the Critic according to that paper description. I have done the following.
statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','fc1')
reluLayer('Name','relu1')
fullyConnectedLayer(300,'Name','fc2')
reluLayer('Name','relu2')
additionLayer(2,'Name','add')
fullyConnectedLayer(400,'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(300,'Name','fc4')
reluLayer('Name','relu4')
fullyConnectedLayer(1,'Name','fc5')];
actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','action')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
%criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');
criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,...
'Observation',{'observation'},'Action',{'action'},criticOptions);
But I am confused in 'additionLayer' and the 'actionPath'. Is my implementation according to that paper description?
Can anyone suggest?
Thanks.

  0 Comments

Sign in to comment.

Accepted Answer

Emmanouil Tzorakoleftherakis
Hello,
Does this paper use DDPG as well? Any images that show the network architecture? If it's another algorithm, the critic may be implemented with a state value network V(s).
DDPG uses Q-network for the critic which needs to take in state and actions (s,a). Reinforcement Learning Toolbox lets you implement this architecture by providing separate input "channels" or paths for the state and the action. That allows you to use different layers in these two paths to extract features more efficiently. See for example image below:
If you want, you can concatenate the observation and action inputs and use a common feature extraction path as follows:
% create a network to be used as underlying critic approximator
statePath = featureInputLayer(numObservations, 'Normalization', 'none', 'Name', 'state');
actionPath = featureInputLayer(numActions, 'Normalization', 'none', 'Name', 'action');
commonPath = [concatenationLayer(1,2,'Name','concat')
fullyConnectedLayer(400, 'Name', 'CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(300, 'Name', 'CriticStateFC2')
reluLayer('Name','CriticRelu2')
fullyConnectedLayer(1,'Name','StateValue')];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');
plot(criticNetwork)
Hope that helps

  4 Comments

Show 1 older comment
Emmanouil Tzorakoleftherakis
I will take a look and reply in your other post since this is a different question. If the answer to this question makes sense to you, make sure to accept it.
laha
laha on 30 Nov 2020
Hello Emmanouil,
I tried training the agent, but it's performing quite poorly. I think it may be a problem with the hyperparameter values since I have not tuned anything. Now I have two questions--
  1. I am trying to understand the effects of hyperparameters by reading some resources. But I want to know if there is anything in MATLAB that may help solve this problem other than trial-and-error?
  2. How do I save the best performing agent given I don't know the critical (reward) value? Basically, I want to save the agent that provides maximum reward or, say, top-5 highest rewarding agents?
Thanks.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!