Reward in training manager higher than should be

Question

0 votes

Hi,

I am trying to train a reinfocement learning agent and I have the environment setup in simulink. I'm facing two issues:

1- The reward in the training manager appears to be much higher than it should be. As shown in the picture below, the scope connected to the reward signal shows a reward value of 1 which is correct. However, in the training manager it is 70 which is not correct.

2- After a number of episodes, the training stops and I get an error message:

Error using rl.env.AbstractEnv/simWithPolicy (line 82)
An error occurred while simulating "ADSTestBed" with the agent "falsifier_agent".
Error in rl.task.SeriesTrainTask/runImpl (line 33)
            [varargout{1},varargout{2}] = simWithPolicy(this.Env,this.Agent,simOpts);
Error in rl.task.Task/run (line 21)
            [varargout{1:nargout}] = runImpl(this);
Error in rl.task.TaskSpec/internal_run (line 166)
            [varargout{1:nargout}] = run(task);
Error in rl.task.TaskSpec/runDirect (line 170)
            [this.Outputs{1:getNumOutputs(this)}] = internal_run(this);
Error in rl.task.TaskSpec/runScalarTask (line 194)
                runDirect(this);
Error in rl.task.TaskSpec/run (line 69)
                runScalarTask(task);
Error in rl.train.SeriesTrainer/run (line 24)
            run(seriestaskspec);
Error in rl.train.TrainingManager/train (line 421)
            run(trainer);
Error in rl.train.TrainingManager/run (line 211)
            train(this);
Error in rl.agent.AbstractAgent/train (line 78)
    TrainingStatistics = run(trainMgr);
Error in ADSTestBedScript (line 121)
trainingStats = train(falsifier_agent,env,trainOpts);
Caused by:
    Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
    Invalid input argument type or size such as observation, reward, isdone or loggedSignals.
        Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
        Unable to compute gradient from representation.
            Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
            Error using 'backwardLoss' in Layer rl.layer.FcnLossLayer. The function threw an
            error and could not be executed.
                Error using rl.env.SimulinkEnvWithAgent>localHandleSimoutErrors (line 681)
                Number of elements must not change. Use [] as one of the size inputs to
                automatically calculate the appropriate size for that dimension.

I should mention that I have another agent in the simulink model but that agent is not being trained.

Version 2020b

Any help is appreciated. Thanks

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Mohammed Eleffendi on 18 Mar 2021

0 votes

For the first issue, the reward in the training manager is the cumulative episode reward whereas the reward in the scope is a plot of the reward for every time step. So the reward in the training manager is correct there is no issue in here.

For the second issue, it turns out if you have 'UseDevice" set to 'gpu' you will encounter this error. Change it to 'cpu' and the error disappears. Support is exploring what is causing this issue.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 2

Emmanouil Tzorakoleftherakis on 11 Mar 2021

0 votes

Cannot be sure about the error, but it seems somewhere in your setup you are currently changing changing the number of parameters/inputs (check inputs to the RL Agent block).

For your first question, individual reward at each time step is different than the episode reward shown in the Episode Manager. The latter sums up the individual rewards over all time steps of an episode

4 Comments
Show 2 older comments Hide 2 older comments

Gaurav Shetty on 14 Sep 2021

Please check my code and ldentify the possible cause for the error

zhq on 29 Aug 2024

我想问下：如果我把individual reward按时间步累加得到的值应该和the episode reward shown in the Episode Manager差不多对不对？我遇到一个不太明白的场景https://ww2.mathworks.cn/matlabcentral/answers/2148684-reinforcement-learning-training-monitor-episode-reward

Sign in to comment.

Reward in training manager higher than should be

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (1)

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Tags

Community Treasure Hunt

Reward in training manager higher than should be

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (1)

4 Comments Show 2 older comments Hide 2 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments