Reinforcement Learning Noise Model Mean Attraction Constant

What does the mean attraction constant do? How can I tune it properly to promote exploration and learning? I can't seem to get the logic behind it.
With a sample time of 2, when I set it to 1 I get very noisy outputs. In the following graphs, rpm and valve%opening are the agents outputs and they are already scaled by a scaling layer.
When I set it to 0.05, then it seems like the noise model is not doing much explorations.
I also noticed that by applying the abs(1 - MeanAttractionConstant.*SampleTime) formula,
When sample time is 2 and the MAC is 1, the formula gives 1.
When sample time is 2 and the MAC is 0.05, the formula gives 0.9.
How does this relate to how fast the noise converge to the mean?
Thank you very much.

 Accepted Answer

Assuming you are using DDPG, there is some information on the noise model here. I wouldn't worry too much about the mean attraction constant. The value of variance, variancedecayrate and variancemin play a much bigger role on 1) how much noise is added to the agent output and 2) for how long. If you want less noise to be added, reduce the variance value. If you want to explore for longer time, reduce the decay rate and set variancemin to a larger value.

4 Comments

Thank you very much for answering my question agin Emmanouil!
When I continue training the agent using the second setting (mac = 0.15; variance = 10% of range/sqrt(Ts)), I get this plot 4000 episodes when the maximum is 10000.
Due to the limited RAM I have (32 GB) i have to set the experience buffer to 1.5e3 only. The noise decay is around 1e-5.
Does this mean that the training has arrived at a suboptimal solution? Where it chose to breach the constraints (get to isdone == 1) as fast as possible? The problem has 16 observations (normalised through divison of the mean) and 2 outputs. the actions at this stage looks like these 2 images:
Where there is almost no exploration for the second action (vale%opening) and the first action (rpm) just tends to stay at the limits.
Please advise me on what I can try to improve the learning performance.
Thank you very much for your time and effort.
First off, there is no guarantee that the episode reward will always keep rising on average. As the agent explores, it may reach a peak, and then move to explore a different area that generates potentially lower rewards.
What I would do first based on the plot above is to save all the agents that have average reward over zero and see how they perform. If they are good, you have your stopping criteria. The agents towards the final episodes in the plot above don't seem like good candidates so there is no point in looking into them.
Hi Emmanouil,
Hi, I tried that and tweaked the training hyper parameters a little to arrive at an suboptimal solution!
Thank you very much for your help!
i am using DDPG and i need to set my sample time to 800 sec and then i got error as
abs(1 - mean attrc const.*sample time) <= 1
so i made mean att cont.(mac) to 0.0001 but still i am getting the same error
my question is i have to change the mac in noise options of agent or is it some different mean attarc const.

Sign in to comment.

More Answers (0)

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!