How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

4 views (last 30 days)
The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.
I ran the following code. After the first episode, the Q(4,1) is set to -1.
However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.
In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.
Why is Q(4,2) set to 0.7441?
Why is Q(4,1) is updated too and set to -1.67?
clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'Plots', "none",...
'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

Answers (1)

Emmanouil Tzorakoleftherakis
Can you try
critic.Options.L2RegularizationFactor=0;
This parameter is nonzero by default and likely the reason for the discrepancy you are observing
  2 Comments
Tracy Shang
Tracy Shang on 4 May 2021
Edited: Tracy Shang on 4 May 2021
Thanks for your answer!
I tried the code you suggested. The resut showed no difference.
But you inspired me!
I tried another parameter just like as follows. The qTable was updated as shown in the following figure.
critic.Options.OptimizerParameters.GradientDecayFactor =0;
I tried both parameters by add the following codes and the qTable was updated as shown in the following figure. At least, the question about Q(4,1) is solved.
According the parameters I set, the equtation of calculating Qvalue is simplified as follows.
That is, .
Why is Q(4,2) set to -1.4139?
critic.Options.OptimizerParameters.GradientDecayFactor =0;
critic.Options.L2RegularizationFactor=0;
Looking forward to your further answer. Thank you very much!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!