SAC agent actor network setup and action generation

Question

Tech Logg Ding on 11 Mar 2021

1
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/769382-sac-agent-actor-network-setup-and-action-generation

Commented: Sampson Nwachukwu on 10 Jan 2023

Hi, I'm trying to develop a SAC agent for continuous control task with 2 actions. The agent's explored actions looks like this:

The first action is fluctuating between the maximum and minimum. The second action seems to be exploring the action space well. It is good to note that the range and magnitude of the actions differ signifcantly but I have normalized it at the critic input.

When I attempted these using deterministic agents, I used a tanhLayer and scalingLayer to normalize the action and scale it. The SAC documentation here suggests that the output tanhLayer and ScalingLayers are added automatically, eventhough it does not show up in the actor network structure.

The documentation also quotes: 'Do not add a tanhLayer or scalingLayer in the mean output path. The SAC agent internally transforms the unbounded Gaussian distribution to the bounded distribution to compute the probability density function and entropy properly'

However, the behaviour where a tanhLayer always output -1 or 1 (in the case of the first action) isn't very logical to me. Do I have to add the tanhLayer and Scalinglayer manually for this to work correctly? Is there any reason why it is only fluctuating between -1 and 1 without exploring other actions in between?

1 Comment
Show -1 older commentsHide -1 older comments

Takeshi Takahashi on 9 Apr 2021

Adding a tanh layer and a scaling layer to the mean path is unnecessary since the SAC agent applies tanh and scaling internally based on the action spec.

The first action range is much larger than the second action, which might cause the exploration issue. The standard deviation from the network for the first action is probably too big.

I suggest the following:

Use small EntropyWeightOptions.EntropyWeight in rlSACAgentOptions like 0.01. This weight is automatically learned internally, but it would take some time if the initial EntropyWeight is too big.
You can add a tanh layer and a scaling layer to the standard deviation path to directly minimize the action's uncertainty.
If none of the above doesn't work, it would be better to normalize actions in the environment. You can set the same range for all actions in the action spec and scale them correctly inside the environment. Because SAC relies on entropy for the exploration, having similar action ranges will be better.

Sign in to comment.

Sign in to answer this question.

Answer 1

Sampson Nwachukwu on 10 Jan 2023

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/769382-sac-agent-actor-network-setup-and-action-generation#answer_1145322

Hi,

I am facing a similar challenge.

I have an action space specified as:

numActions = 1;

actionInfo = rlNumericSpec([numActions 1],...

"LowerLimit",-0.1, "UpperLimit",0.1);

Setting the TargetEntropy = -3 or -5 gives a better training curve; although I do not achieve an optimal result. However, when I set it to -1 or allow the program to choose authomatically, I end up getting a very bad training curve with a poor result. I have tried it different temperature coefficient, but I am still getting the same result.

Please, you assistance will be appreciated. Thank you.

1 Comment
Show -1 older commentsHide -1 older comments

Sampson Nwachukwu on 10 Jan 2023

In addition to the question above, is there a way to set the temperature coeffient of SAC automatically on Matlab?

Thank you.

Sign in to comment.

SAC agent actor network setup and action generation

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

SAC agent actor network setup and action generation

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

1 Comment
Show -1 older commentsHide -1 older comments