Main Content

rlMBPOAgentOptions

Options for MBPO agent

    Description

    Use an rlMBPOAgentOptions object to specify options for model-based policy optimization (MBPO) agents. To create an MBPO agent, use rlMBPOAgent.

    For more information, see Model-Based Policy Optimization Agents.

    Creation

    Description

    example

    opt = rlMBPOAgentOptions creates an option object for use as an argument when creating an MBPO agent using all default options. You can modify the object properties using dot notation.

    opt = rlMBPOAgentOptions(Name=Value) sets option properties using name-value pair arguments. For example, rlMBPOAgentOptions(DiscountFactor=0.95) creates an option set with a discount factor of 0.95. You can specify multiple name-value pair arguments.

    Properties

    expand all

    Number of epochs for training the environment model, specified as a positive integer.

    Number of mini-batches used in each environment model training epoch, specified as a positive scalar or "all". When you specify NumMiniBatches to "all", the agent selects the number of mini-batches such that all samples in the base agents experience buffer are used to train the model.

    Size of random experience mini-batch for training environment model, specified as a positive integer. During each model training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the environment model properties. Large mini-batches reduce the variance when computing gradients but increase the computational effort.

    Generated experience buffer size, specified as a positive integer. When the agent generates experiences, they are added to the model experience buffer.

    Ratio of real experiences in a mini-batch for agent training, specified as a nonnegative scalar less than or equal to 1.

    Transition function optimizer options, specified as one of the following:

    • rlOptimizerOptions object — When your neural network environment has a single transition function or if you want to use the same options for multiple transition functions, specify a single options object.

    • Array of rlOptimizerOptions objects — When your neural network environment agent has multiple transition functions and you want to use different optimizer options for the transition functions, specify an array of options objects with length equal to the number of transition functions.

    Using these objects, you can specify training parameters for the transition deep neural network approximators as well as the optimizer algorithms and parameters.

    If you have previously trained transition models and do not want the MBPO agent to modify these models during training, set TransitionOptimizerOptions.LearnRate to 0.

    Reward function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the reward deep neural network approximator as well as the optimizer algorithm and its parameters.

    If you specify a ground-truth reward function using a custom function, the MBPO agent ignores these options.

    If you have a previously trained reward model and do not want the MBPO agent to modify the model during training, set RewardOptimizerOptions.LearnRate to 0.

    Is-done function optimizer options, specified as an rlOptimizerOptions object. Using this object, you can specify training parameters for the is-done deep neural network approximator as well as the optimizer algorithm and its parameters.

    If you specify a ground-truth is-done function using a custom function, the MBPO agent ignores these options.

    If you have a previously trained is-done model and do not want the MBPO agent to modify the model during training, set IsDoneOptimizerOptions.LearnRate to 0.

    Model roll-out options for controlling the number and length of generated experience trajectories, specified as an rlModelRolloutOptions object with the following fields. At the start of each epoch, the agent generates the roll-out trajectories and adds them to the model experience buffer. To modify the roll-out options, use dot notation.

    Number of trajectories for generating samples, specified as a positive integer.

    Initial trajectory horizon, specified as a positive integer.

    Option for increasing the horizon length, specified as one of the following values.

    • "none" — Do not increase the horizon length.

    • "piecewise" — Increase the horizon length by one after every N model training epochs, where N is equal to HorizonUpdateFrequency.

    Number of epochs after which the horizon increases, specified as a positive integer. When RolloutHorizonSchedule is "none" this option is ignored.

    Maximum horizon length, specified as a positive integer greater than or equal to RolloutHorizon. When RolloutHorizonSchedule is "none" this option is ignored.

    Training epoch at which to start generating trajectories, specified as a positive integer.

    Exploration model options for generating experiences using the internal environment model, specified as one of the following:

    • [] — Use the exploration policy of the base agent. You must use this option when training a SAC base agent.

    • EpsilonGreedyExploration object — You can use this option when training a DQN base agent.

    • GaussianActionNoise object — You can use this option when training a DDPG or TD3 base agent.

    The exploration model uses only the initial noise option values and does not update the values during training.

    To specify NoiseOptions, create a default model object. Then, specify any nondefault model properties using dot notation.

    • Specify epsilon greedy exploration options.

      opt = rlMBPOAgentOptions;
      opt.ModelRolloutOptions.NoiseOptions = ...
          rl.option.EpsilonGreedyExploration;
      opt.ModelRolloutOptions.NoiseOptions.EpsilonMin = 0.03;
    • Specify Gaussian action noise options.

      opt = rlMBPOAgentOptions;
      opt.ModelRolloutOptions.NoiseOptions = ...
          rl.option.GaussianActionNoise;
      opt.ModelRolloutOptions.NoiseOptions.StandardDeviation = sqrt(0.15);

    For more information on noise models, see Noise Models.

    Object Functions

    rlMBPOAgentModel-based policy optimization reinforcement learning agent

    Examples

    collapse all

    Create an MBPO agent options object, specifying the ratio of real experiences to use for training the agent as 30%.

    opt = rlMBPOAgentOptions(RealSampleRatio=0.3)
    opt = 
      rlMBPOAgentOptions with properties:
    
           NumEpochForTrainingModel: 1
                     NumMiniBatches: 10
                      MiniBatchSize: 128
         TransitionOptimizerOptions: [1×1 rl.option.rlOptimizerOptions]
             RewardOptimizerOptions: [1×1 rl.option.rlOptimizerOptions]
             IsDoneOptimizerOptions: [1×1 rl.option.rlOptimizerOptions]
        ModelExperienceBufferLength: 100000
                ModelRolloutOptions: [1×1 rl.option.rlModelRolloutOptions]
                       NoiseOptions: []
                    RealSampleRatio: 0.3000
                         InfoToSave: [1×1 struct]
    
    

    You can modify options using dot notation. For example, set the mini-batch size to 64.

    opt.MiniBatchSize = 64;

    Algorithms

    expand all

    Version History

    Introduced in R2022a