Action Value can't be constained
4 views (last 30 days)
Show older comments
I am a beginner to RL and now trying to use policy gradient as my agent. Here is somthing weird I found when I try to output the action value in a certain range.
In the Create Continuous Stochastic Actor from Deep Neural Network of this link:
The action value limit is set first in rlNumericSpec(), but the constrain here seems to have no effect on the actual actor output. If I change the lower limit to 0, it would still yield negative value.
My question is, to actually have an action output within range, do I need to achieve this via the neural network construction. Say I want a range of 0 to 5, how should I modify the network then?
BTW, why the output elements of the neural network should be two times the actual action output? what's happening inside rlStochasticActorRepresentation()?
0 Comments
Answers (1)
Asvin Kumar
on 6 Aug 2020
For your first question:
Have a look at the discussion here. https://www.mathworks.com/matlabcentral/answers/515602-incorrect-tanhlayer-output-in-rl-agent#answer_425717
In short, it might be because of the noise added to the predicted action. If I'm not wrong, you should be able to modify the properties of the noise in such a way that it doesn't affect your range.
For your second question:
The documentation for rlStochasticActorRepresentation says that the network output layer must have twice as many elements as the number of dimensions of the continuous action space and that they represent all the mean values followed by all the variances (which must be non-negative) of the Gaussian distributions for the dimensions of the action space.
The reason for the mean and variance is the nature of stochastic actors. From the description of rlStochasticActorRepresentation, a stochastic actor takes the observations as inputs and returns a random action, thereby implementing a stochastic policy with a specific probability distribution. This random action is sampled from the Gaussian distribution described by the mean and variance.
4 Comments
See Also
Categories
Find more on Image Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!