SAC agent actor network setup and action generation

11 views (last 30 days)
Hi, I'm trying to develop a SAC agent for continuous control task with 2 actions. The agent's explored actions looks like this:
The first action is fluctuating between the maximum and minimum. The second action seems to be exploring the action space well. It is good to note that the range and magnitude of the actions differ signifcantly but I have normalized it at the critic input.
When I attempted these using deterministic agents, I used a tanhLayer and scalingLayer to normalize the action and scale it. The SAC documentation here suggests that the output tanhLayer and ScalingLayers are added automatically, eventhough it does not show up in the actor network structure.
The documentation also quotes: 'Do not add a tanhLayer or scalingLayer in the mean output path. The SAC agent internally transforms the unbounded Gaussian distribution to the bounded distribution to compute the probability density function and entropy properly'
However, the behaviour where a tanhLayer always output -1 or 1 (in the case of the first action) isn't very logical to me. Do I have to add the tanhLayer and Scalinglayer manually for this to work correctly? Is there any reason why it is only fluctuating between -1 and 1 without exploring other actions in between?
  1 Comment
Takeshi Takahashi
Takeshi Takahashi on 9 Apr 2021
Adding a tanh layer and a scaling layer to the mean path is unnecessary since the SAC agent applies tanh and scaling internally based on the action spec.
The first action range is much larger than the second action, which might cause the exploration issue. The standard deviation from the network for the first action is probably too big.
I suggest the following:
  1. Use small EntropyWeightOptions.EntropyWeight in rlSACAgentOptions like 0.01. This weight is automatically learned internally, but it would take some time if the initial EntropyWeight is too big.
  2. You can add a tanh layer and a scaling layer to the standard deviation path to directly minimize the action's uncertainty.
  3. If none of the above doesn't work, it would be better to normalize actions in the environment. You can set the same range for all actions in the action spec and scale them correctly inside the environment. Because SAC relies on entropy for the exploration, having similar action ranges will be better.

Sign in to comment.

Answers (1)

Sampson Nwachukwu
Sampson Nwachukwu on 10 Jan 2023
Hi,
I am facing a similar challenge.
I have an action space specified as:
numActions = 1;
actionInfo = rlNumericSpec([numActions 1],...
"LowerLimit",-0.1, "UpperLimit",0.1);
Setting the TargetEntropy = -3 or -5 gives a better training curve; although I do not achieve an optimal result. However, when I set it to -1 or allow the program to choose authomatically, I end up getting a very bad training curve with a poor result. I have tried it different temperature coefficient, but I am still getting the same result.
Please, you assistance will be appreciated. Thank you.
  1 Comment
Sampson Nwachukwu
Sampson Nwachukwu on 10 Jan 2023
In addition to the question above, is there a way to set the temperature coeffient of SAC automatically on Matlab?
Thank you.

Sign in to comment.

Products


Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!