Main Content

Specify Training Options in Reinforcement Learning Designer

To configure the training of an agent in the Reinforcement Learning Designer app, specify training options on the Train tab.

Specify training options on the Train tab.

Specify Basic Options

On the Train tab, you can specify the following basic training options.

OptionDescription
Max EpisodesMaximum number of episodes to train the agent, specified as a positive integer.
Max Episode LengthMaximum number of steps to run per episode, specified as a positive integer.
Stopping Criteria

Training termination condition, specified as one of the following values.

  • AverageSteps — Stop training when the running average number of steps per episode equals or exceeds the critical value specified by Stopping Value.

  • AverageReward — Stop training when the running average reward equals or exceeds the critical value.

  • EpisodeReward — Stop training when the reward in the current episode equals or exceeds the critical value.

  • GlobalStepCount — Stop training when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • EpisodeCount — Stop training when the number of training episodes equals or exceeds the critical value.

Stopping ValueCritical value of the training termination condition in Stopping Criteria, specified as a scalar.
Average Window LengthWindow length for averaging the scores, rewards, and number of steps for the agent when either Stopping Criteria or Save agent criteria specify an averaging condition.

Specify Additional Options

To specify additional training options, on the Train tab, click More Options.

In the More Training Options dialog box, you can specify the following options.

OptionDescription
Save agent criteria

Condition for saving agents during training, specified as one of the following values.

  • none — Do not save any agents during training.

  • AverageSteps — Save the agent when the running average number of steps per episode equals or exceeds the critical value specified by Save agent value.

  • AverageReward — Save the agent when the running average reward equals or exceeds the critical value.

  • EpisodeReward — Save the agent when the reward in the current episode equals or exceeds the critical value.

  • GlobalStepCount — Save the agent when the total number of steps in all episodes (the total number of times the agent is invoked) equals or exceeds the critical value.

  • EpisodeCount — Save the agent when the number of training episodes equals or exceeds the critical value.

Save agent valueCritical value of the save agent condition in Save agent criteria, specified as a scalar or "none".
Save directory

Folder for saved agents. If you specify a name and the folder does not exist, the app creates the folder in the current working directory.

To interactively select a folder, click Browse.

Show verbose outputSelect this option to display training progress at the command line.
Stop on ErrorSelect this option to stop training when an error occurs during an episode.
Training plot

Option to graphically display the training progress in the app, specified as one of the following values. "training-progress" or "none".

  • training-progress — Show training progress

  • none — Do not show training progress

Specify Parallel Training Options

To train your agent using parallel computing, on the Train tab, click Parallel computing icon.. Training agents using parallel computing requires Parallel Computing Toolbox™ software. For more information, see Train Agents Using Parallel Computing and GPUs.

To specify options for parallel training, select Use Parallel > Parallel training options.

Parallel training options dialog box.

In the Parallel Training Options dialog box, you can specify the following training options.

OptionDescription
Parallel computing mode

Parallel computing mode, specified as one of the following values.

  • sync — Use parpool to run synchronous training on the available workers. In this case, workers pause execution until all workers are finished. The host updates the actor and critic parameters based on the results from all the workers and sends the updated parameters to all workers.

  • async — Use parpool to run asynchronous training on the available workers. In this case, workers send their data back to the host as soon as they finish and receive updated parameters from the host. The workers then continue with their task.

Type of data from workers

Type of data that workers send to the host, specified as one of the following values.

  • experiences — The simulation is performed by the workers, and the learning is performed by the host. Specifically, the workers simulate the agent against the environment, and send experience data (observation, action, reward, next observation, and a flag indicating whether a terminal condition has been reached) to the host. For agents with gradients, the host computes gradients from the experiences, updates the network parameters and sends back the updated parameters to the workers to they can perform a new simulation against the environment.

  • gradients — Both simulation and learning are performed by the workers. Specifically, the workers simulate the agent against the environment, compute the gradients from experiences, and send the gradients to the host. The host averages the gradients, updates the network parameters and sends back the updated parameters to the workers to they can perform a new simulation against the environment.

Note

For DQN, DDPG, PPO, and TD3 you must set this option to experiences.

Steps until data is sent

Number of steps after which workers send data to the host and receive updated parameters, specified as –1 or a positive integer. When this option is –1, the worker waits until the end of the episode and then sends all step data to the host. Otherwise, the worker waits the specified number of steps before sending data.

Transfer workspace variables to workers

Select this option to send model and workspace variables to parallel workers. When you select this option, the host sends variables used in models and defined in the MATLAB® workspace to the workers.

Random seed for workers

Randomizer initialization for workers, specified as one of the following values.

  • –1 — Assign a unique random seed to each worker. The value of the seed is the worker ID.

  • –2 — Do not assign a random seed to the workers.

  • Vector — Manually specify the random seed for each worker. The number of elements in the vector must match the number of workers.

Files to attach to parallel poolAdditional files to attach to the parallel pool. Specify names of files in the current working directory, with one name on each line.
Worker setup functionFunction to run before training starts, specified as a handle to a function having no input arguments. This function is run once per worker before training begins. Write this function to perform any processing that you need prior to training.
Worker cleanup functionFunction to run after training ends, specified as a handle to a function having no input arguments. You can write this function to clean up the workspace or perform other processing after training terminates.

The following figure shows an example parallel training configuration the following files and functions.

  • Data file attached to the parallel pool — workerData.mat

  • Worker setup function — mySetup.m

  • Worker cleanup function — myCleanup.m

Parallel training options dialog showing file and function information.

See Also

Related Topics