Reinforcement Learning Toolbox

Design and train policies using reinforcement learning

Reinforcement Learning Toolbox™ provides an app, functions, and a Simulink® block for training policies using reinforcement learning algorithms, including DQN, PPO, SAC, and DDPG. You can use these policies to implement controllers and decision-making algorithms for complex applications such as resource allocation, robotics, and autonomous systems.

The toolbox lets you represent policies and value functions using deep neural networks or look-up tables and train them through interactions with environments modeled in MATLAB® or Simulink. You can evaluate the single- or multi-agent reinforcement learning algorithms provided in the toolbox or develop your own. You can experiment with hyperparameter settings, monitor training progress, and simulate trained agents either interactively through the app or programmatically. To improve training performance, simulations can be run in parallel on multiple CPUs, GPUs, computer clusters, and the cloud (with Parallel Computing Toolbox™ and MATLAB Parallel Server™).

Through the ONNX™ model format, existing policies can be imported from deep learning frameworks such as TensorFlow™ Keras and PyTorch (with Deep Learning Toolbox™). You can generate optimized C, C++, and CUDA® code to deploy trained policies on microcontrollers and GPUs. The toolbox includes reference examples to help you get started.


Get Started:

Reinforcement Learning Agents

Create and configure reinforcement learning agents to train policies in MATLAB and Simulink. Use built-in or develop custom reinforcement learning algorithms.

Reinforcement Learning Algorithms

Create agents using deep Q-network (DQN), deep deterministic policy gradient (DDPG), proximal policy optimization (PPO), and other built-in algorithms. Use templates to develop custom agents for training policies.

Training algorithms available in Reinforcement Learning Toolbox.

Reinforcement Learning Designer App

Interactively design, train, and simulate reinforcement learning agents. Export trained agents to MATLAB for further use and deployment.

Policy and Value Function Representation Using Deep Neural Networks

For complex systems with large state-action spaces, define deep neural network policies programmatically, using layers from Deep Learning Toolbox, or interactively, with Deep Network Designer. Alternatively, use the default network architecture suggested by the toolbox. Initialize the policy using imitation learning to accelerate training. Import and export ONNX models for interoperability with other deep learning frameworks.

Single- and Multi-Agent Reinforcement Learning in Simulink

Create and train reinforcement learning agents in Simulink with the RL Agent block. Train multiple agents simultaneously (multi-agent reinforcement learning) in Simulink using multiple instances of the RL Agent block.

The reinforcement learning agent block for Simulink.

Environment Modeling

Create MATLAB and Simulink environment models. Describe system dynamics and provide observation and reward signals for training agents.

Simulink and Simscape Environments

Use Simulink and Simscape™ to create a model of an environment. Specify the observation, action, and reward signals within the model.

Simulink environment model for a biped robot.

MATLAB Environments

Use MATLAB functions and classes to model an environment. Specify observation, action, and reward variables within the MATLAB file.

MATLAB environment for a three-degrees-of-freedom rocket.

Accelerating Training

Speed up training using GPU, cloud, and distributed computing resources.

Speeding up training using parallel computing.

GPU Acceleration

Speed up deep neural network training and inference with high-performance NVIDIA® GPUs. Use MATLAB with Parallel Computing Toolbox and most CUDA-enabled NVIDIA GPUs that have compute capability 3.0 or higher.

Accelerate training using GPUs.

Code Generation and Deployment

Deploy trained policies to embedded devices or integrate them with a wide range of production systems.

Code Generation

Use GPU Coder™ to generate optimized CUDA code from MATLAB code representing trained policies. Use MATLAB Coder™ to generate C/C++ code to deploy policies.

Generating CUDA code using GPU Coder.

MATLAB Compiler Support

Use MATLAB Compiler™ and MATLAB Compiler SDK™ to deploy trained policies as standalone applications, C/C++ shared libraries, Microsoft® .NET assemblies, Java® classes, and Python® packages.

Packaging and sharing policies as standalone programs.

Reference Examples

Design controllers and decision-making algorithms for robotics, automated driving, calibration, scheduling, and other applications.

Getting Started

See how to develop reinforcement learning policies for problems such as inverting a simple pendulum, navigating a grid world, balancing a cart-pole system, and solving generic Markov decision processes.

Automated Driving

Design reinforcement learning policies for automated driving applications such as adaptive cruise control, lane keeping assistance, and automatic parking.

Tuning, Calibration, and Scheduling

Design reinforcement learning policies for tuning, calibration, and scheduling applications.

Resource allocation problem for water distribution.

Reinforcement Learning Video Series

Watch the videos in this series to learn more about reinforcement learning.

Additional Reinforcement Learning Toolbox Resources