What is the best activation function to get action between 0 and 1 in DDPG network?

14 views (last 30 days)

Sayak Mukherjee on 13 Oct 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network

Commented: awcii on 28 Jul 2023

Accepted Answer: Emmanouil Tzorakoleftherakis

I am using DDPG network to run a control algorithm which has inputs (actions of RL agent, 23 in total) varying between 0 and 1. I an defining this using rlNumericSpec

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);

Then I am using tanhLayer in the actor network (similar to bipedal robot example) and then using

actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, 'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, 'Observation',{'observation'},  'Action',{'ActorTanh1'},actorOptions);

But i feel that the model is only taking the extreme options ie mostly 0 and 1.

Will it be better to use a sigmoid function to get better action estimates?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

Emmanouil Tzorakoleftherakis on 15 Oct 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/613031-what-is-the-best-activation-function-to-get-action-between-0-and-1-in-ddpg-network#answer_514888

Hello,

With DDPG, a common thing to do in the final 3 layers of the actor is to use a fully connected layer, a tanh layer and a scaling layer. Tanh will get the ouput of that layer between -1 and 1 and then you can use the scaling layer to scale/shift values as needed based on the specifications of the actuator in your problem.

It seems the problem here is due to noise that is being added during training with DDPG to allow sufficient exploration (for example see step 1 here). The default noise options have a pretty high variance, so when this is added to the output of the tanh layer, it ends up outside the [0, 1] range and is being clipped. This is why you are only getting the two extremes.

Try adjusting the DDPG noise options, and particularly the variance (make it smaller, e.g. <=0.1). Also, see here for some best practices when choosing noise parameters.

Hope that helps

12 Comments
Show 10 older commentsHide 10 older comments

Sayak Mukherjee on 15 Oct 2020

Edited: Sayak Mukherjee on 15 Oct 2020

Open in MATLAB Online

I should have been clearer

actInfo = rlNumericSpec([numAct 1],'LowerLimit',0,'UpperLimit', 1);
actInfo.Name = 'STIM'
env = rlSimulinkEnv(mdl,blk,obsInfo,actInfo);

And then I am defining the actornetwork

actorNetwork = [
    imageInputLayer([numObs 1 1],'Normalization','none','Name','observation')
    fullyConnectedLayer(actorLayerSizes(1), 'Name', 'ActorFC1', ...
            'Weights',2/sqrt(numObs)*(rand(actorLayerSizes(1),numObs)-0.5), ... 
            'Bias',2/sqrt(numObs)*(rand(actorLayerSizes(1),1)-0.5))
    reluLayer('Name', 'ActorRelu1')
    fullyConnectedLayer(actorLayerSizes(2), 'Name', 'ActorFC2', ... 
            'Weights',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),actorLayerSizes(1))-0.5), ... 
            'Bias',2/sqrt(actorLayerSizes(1))*(rand(actorLayerSizes(2),1)-0.5))
    reluLayer('Name', 'ActorRelu2')
    fullyConnectedLayer(numAct, 'Name', 'ActorFC3', ... 
            'Weights',2*5e-3*(rand(numAct,actorLayerSizes(2))-0.5), ... 
            'Bias',2*5e-5*(rand(numAct,1)-0.5))                       
    tanhLayer('Name','ActorTanh1')
    ];
% Create actor representation
actorOptions = rlRepresentationOptions('Optimizer','adam','LearnRate',1e-4, ...
                                       'GradientThreshold',1,'L2RegularizationFactor',1e-5);
actor = rlRepresentation(actorNetwork,env.getObservationInfo,env.getActionInfo, ... 
                         'Observation',{'observation'}, ...
                         'Action',{'ActorTanh1'},actorOptions);

So my question is do I need a separate scaling layer after tanh layer even though I have defined lowerlimit as 0 in actInfo. My actions fluctuated between -1 and 1 with this architecture. If I use sigmoid function then I get the action between 0 and 1.

Sayak Mukherjee on 15 Oct 2020

thanks

awcii on 28 Jul 2023

@Sayak Mukherjee What about your problem ? Did you solve it ?

Products

Release

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

What is the best activation function to get action between 0 and 1 in DDPG network?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

12 Comments
Show 10 older commentsHide 10 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

What is the best activation function to get action between 0 and 1 in DDPG network?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

12 Comments Show 10 older commentsHide 10 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

12 Comments
Show 10 older commentsHide 10 older comments