Teaching Deep Reinforcement Learning with MATLAB
Dr. Rifat Sipahi, Northeastern University
Watch this webinar by Professor Rifat Sipahi from Northeastern University to learn about the curriculum materials his team developed for teaching RL and DRL with MATLAB®. The RL modules let students implement various applications such as grid-world navigation, temperature control, walking robots, and portfolio management.
Download the curriculum materials here. Solutions are available upon instructor request.
Published: 27 Sep 2022
Greetings, everyone. This is Rifat Sipahi from Department of Mechanical and Industrial Engineering at Northeastern University in Boston. And today I would like to present to you "MATLAB Deep Reinforcement Learning with Engineering Applications," a joint work with my collaborators, Professor Dehghani and Mr. Belsare, in our department in the master's program.
Today, what I would like to do is, first of all, briefly motivate the data-centric industrialization that has been rapidly, exponentially growing in many aspects. And some of these discussions are related to control systems as well, as, for example, one can imagine cyber-physical systems that many of our colleagues in this field in controls field study. And as this is the way things are evolving and becoming more and more valuable, interesting, we are reaching to a level-- or we maybe already have reached to a level in our educational world the need to teach our students tools that surround data-driven techniques.
And among these, obviously, one goal is to use data and to make efficient and optimum decisions in complex, interconnected, dynamical systems. And this is a predicted trend line, and it's not stagnant. It is actually growing, and we are expecting a growing number of areas where machine learning and reinforcement learning tools to be valuable and relevant in industrial and control systems processes. And reinforcement learning is one that is really in the center of-- one of the tools that is really in the center of this growing field.
Now, what we have realized in view of these discussions is that there is obviously a growing trend in industry that the future engineers will need to have skills in data science and data-driven tools. But on the other hand, point of entry to learn these tools may not be as straightforward. And depending on what majors the students are studying, these entry points may be a little bit more or less challenging. And so how could we reduce those barriers for entry?
in the Meanwhile, is there a way we can maybe develop some tools that are aligned with the students' disciplines? For example, can we develop tools to teach reinforcement learning to students in mechanical engineering in applications of mechanical engineering and electrical engineering students in maybe electrical engineer applications? This way, there is a connection to what students are studying and how may they then frame reinforcement learning in their minds when they're learning.
Of course, these new skills will definitely provide new career opportunities for students. So they are obviously very important and valuable. Now, given that many of our students are learning MATLAB in multiple courses, we have seen that we could potentially start bridging the gap of knowledge between knowledge in students' understanding of reinforcement learning through MATLAB's existing reinforcement learning toolbox. And so we have also started thinking, because we are primarily teaching mechanical engineering and industrial engaging students, how can we then kind of put these together in the context of industrial and mechanical engineering.
In a nutshell, although we're not going to go into the details of reinforcement learning, we are envisioning a setup in which an agent executes actions, receives observations, and rewards. And then the environment changes up in action, and it emits observations and rewards. Now, there are obviously different ways to create a reinforcement learning environment and framework in a computational setting. One famous one is obviously by Python, and here we don't mean to say we shouldn't do it with Python. But we would like to say, well, if our students have already skills in MATLAB and if they think this is a natural transition, they could still go further using their MATLAB skills to learn and practice reinforcement learning.
So, reinforcement learning with MATLAB. So we see several advantages from our point of view. As I said earlier, students learn and utilize MATLAB in their courses already. So that's a good starting point.
There is a dedicated reinforcement learning toolbox with preloaded functions to create environments, agents, and training policies in MATLAB. Provision to build custom environments using simulations. I'm going to mention in a moment a little bit on details. Ease of integrating MATLAB with other toolboxes, such as deep learning, optimization, and Simulink.
And there is a reinforcement learning designer app, if one would like to use the app. And there are options to also integrate GPUs and Python into reinforcement learning libraries. And, of course, there is a dedicated online documentation available to support.
In our research, what we would like to do is we would like to look at the reinforcement learning toolbox and create a set of modules that progressively become more and more complex in, hopefully, a gradual way of growing the difficulty of these modules in which there are various ways a reinforcement learning toolbox is integrated and leveraged. So our goals are for the students to learn the basics of reinforcement learning, MDP and RL problem formulation, apply RL and deep learning algorithms using MATLAB reinforcement learning toolbox.
Students will then be able to create RL environments in MATLAB and Simulink, and they will be able to apply these learned tools in a variety of applications. And then there should be some documentation to guide them through, for example, tutorials for this purpose. And to meet that need, we have this overall higher-level proposed plan.
We would like to develop modules that utilize leveraged reinforcement learning toolbox. And these modules will be built, keeping in mind several engineer applications. And we will cover a range of MathWorks platforms, not only MATLAB, but we will also show how to integrate them with Simulink, and deep learning, and reinforcement learning toolbox altogether.
And to address the students' needs in the application domain, we decided to focus on a cart pole balancing problem, a robot walking problem, portfolio management problem, and an HVAC control problem. And along with these modules, we decided to develop documentation and tutorials to be shared on MATLAB File Exchange website and in two different ways-- one on File Exchange that is geared for students and, with the request from the instructors, a private instructor copy which solution sets will be made available. That's how we charted this project.
Now, to detail more, we want to teach the basics of reinforcement learning toolbox functions and demonstrate it in the context of applications and different environments. These environments can be predefined, or they can be customized. They can build in Simulink and et cetera. And we want to have deep learning and reinforcement learning designer app within our modules. And we would like to formulate several of the application problems that I just mentioned within the RL reinforcement learning toolbox in MATLAB.
Now, how do we get started? Well, here's how we broke actually down these modules. Each stage is a different module that is highlighting a different skill or different way of teaching certain concepts.
Stage one is focused on reinforcement learning concepts and reinforcement learning toolbox, where the environment is predefined and we have tabular agents. And then in stage two, we have used reinforcement learning designer app, and we have designed stochastic and custom MATLAB environments. And then we have a deep learning agent.
In stage three, we have integrated MATLAB and Simulink together to create an environment using Simulink. And then we have also used a deep learning agent. In stage four, we have additional mechanical and industrial engineering environments that I'm going to show you in a moment. Now, let's just walk through these stages to give you a high-level definition of what they entail. Let's take a look at this a little bit more in details as follows.
On the far left, we actually are going to demonstrate a cart pole balancing problem, which has a mechanical and control systems type of flavor. And the original model, we actually borrowed from Greg Surma, and I have a hyperlink here. If anyone would like to see the slides, they will have access to all the links.
And then that model by Greg Surma is based on Python. And here we will demonstrate it in the MATLAB environment. And in the second column on the left, we have thermostat and servomotor control problem, which actually uses the model from the MathWorks website and along with the RL reinforcement learning toolbox.
And in the third column, we again use a model from the MathWorks that is based on portfolio optimization. And the last column on the far right, it is another model from the MathWorks that is focused on robotic motion, primarily walking. These are the environments that we have built for the benefits of our the users.
All right. Now, in stage one, as I mentioned, we have predefined MATLAB environment and we are using Q-learning and SARSA. And we start with some simple introductory-level understanding of how an agent explores the environment, how it earns rewards, and where does it go, and how does it decide, both one-dimensional and two-dimensional.
And one nice thing here is that for module one, for stage one, we also have prepared a MATLAB script that does not use the reinforcement learning toolbox. So this way we actually are able to request the student to produce their own script based on the basic underlying principles of reinforcement learning in a grid world problem. And then once the students obtain the results, then we take them to the next level within this stage one, where they use the reinforcement learning toolbox for the same goal in a grid world problem. And they saw the problem again with the Toolbox.
In this way, in the first exercise, students get to know what's under the hood. And in the second exercise, they are able to follow how the reinforcement learning toolbox provides a straightforward and convenient way to run and build the RL problem. And this work has led to-- the stage one has led to a number of, let's say, a documentation and coding and assignments that we have built and that are available online.
In stage two, we have a cart pole problem, and this is definitely a much more advanced level of implementation. And we make all the documentation and the codes available for instructors to use. And, of course, instructors will have a better read of their classes, and what their students know, and when will be the best time to bring in an example of this nature for their students. And we believe that some instructors or students will also find the reinforcement learning designer app useful as a platform to interact with the RL problem.
Stage two leads to, again, a set of documentation, coding, and assignments for both the instructor and the student. And that's also available online. Stage three is demonstrating how to bring Simulink to describe an environment. And here we have combined that with deep Q-learning, and the HVAC model, the thermostat controller model that we utilized, comes from the MathWorks website. And we have built on that this stage three presentation.
And here, just a quick snapshot of how this might look like. For example, if external temperature is fluctuating in blue-- right? How does one, heat more or less, maybe an indoor environment to maintain constant temperature indoors. And one can even test maybe a more model-based approach, such as a PID, and then they can compare that against the reinforcement learning outcome to see benefits and maybe challenges in either control approaches.
Stage three once again yields documentation encoding for this purpose. In stage four, we actually have two applications-- the robot working that I mentioned earlier and the portfolio management. And, again, these have been fully developed and are available within the documentation package we are providing online.
Now, we not only develop these modules for everyone's use. But here on the left picture, and Sahil is presenting and to a number of students, we organized three workshops after the modules have been built. And 24 students participated in total, with different levels of knowledges. And we did not go through all the stages of the documentation, but we actually presented stage one in two separate meetings.
And we have collected some feedback from students and keeping in mind that they all have different levels of knowledge about controls, dynamics, and maybe machine learning, and maybe coding skills. And we have obtained some initial sense of how the students felt about these modules. They definitely appreciate the ease of using pre-defined reinforcement learning functions and the structure documentation that we prepared. And they also very much appreciated using a visual means to interact with the problem through the designer app.
And we were expecting some weaknesses, and some of the important ones is obviously more time is needed to kind of prepare the students mathematically about the foundations of RL. And that is really important, and that's why we thought having modules is more important because the instructors who are going to hopefully be borrowing our modules and using them will know the best timing when to introduce these modules before the mathematical foundations are in place in their classes.
Some additional descriptions of the RL toolbox will be useful. So that's an important note for ourselves or anyone who is going to use our modules. And, again, we also recommend that the students also note that these additional discussions are needed to bring students up to speed on deep reinforcement learning agents. And these are good messages for us to keep thinking how we can improve, refine, and polish these modules for anyone who wants to use to teach them.
We also presented some of the results at a recent INFORMS annual meeting. My colleague, Professor Dehghani, is presenting here in a picture. And then another one recently at the American Control Conference 2022 in Atlanta on June ninth. And now one may wonder where are these resources and how do I contact you. Here's some information here.
Right now, the modules we have put together are available at MATLAB File Exchange at this link. And if there are any questions, please reach out to me or Professor Dehghani by email. And for instructors, if you would like to obtain the copy of the files of which solution sets, please let us know because those files are not available publicly.
Now, there is obviously a lot of interesting things here in terms of education, and pedagogy, and recommendations. And overall one advantage here is our ability to interface different toolboxes of MATLAB, and create these complex decision-making problems, and solve them. And that's one advantage we find really important.
And students are always open to learn, and we have observed that the students that actually were attracted to these presentations we had are focused heavily on data analytics, industrial engineering, robotics, and mechatronics. There are four stages of the RL modules as I present it, and there are different ways to integrate them. Depending on what the students are learning, the instructor can choose which stage or which stages are maybe more appropriate and what the timing of these modules are and in terms of maybe a problems to solve in class or maybe assign a piece of these as homeworks, et cetera.
But also if the instructor would like to use these modules outside the class setting-- for example, if they are holding semi-formal seminars in their departments, if they go and maybe speak to a group of students and they want to show them some high-level reinforcement learning toolbox and what it can do, these modules that we developed could be useful as well. And we think that, obviously, given the level of, let's say, the topics that are being covered in the reinforcement learning modules-- right? We always strongly advise an instructor and/or a knowledgeable TA to support the learning in some semi-structured environment because the way these modules are designed at least for now, they are not self-paced. They are not in a mock type of, let's say, format.
Obviously, a primer on reinforcement learning and the mathematics underlying why and how it works it will be really important before introducing these modules. We also thought about how else these modules could be utilized. For example, we are envisioning actually in our department a possible approach as in the following. Overall, a set of, let's say, training modules at training sessions seminars throughout the semester outside a class setting, where a group of students come and learn these topics. And we lecture and provide them all the necessary tools and the foundation and the mathematics along with them.
And everything that will culminate in a maybe full-day or a weekend-long hackathon with food and prizes and maybe for winners. So we think that this could be structured in a way that a primer on search and mathematical background will be provided to the students throughout the semester. And then the four stages that I just presented would be probably broken down into two sessions each, each session of about two hours. And then these will be spread over about eight weeks of time throughout the semester.
And volunteer students will show up and attend these sessions towards competing at the end-of-the-semester hackathon. So that could be also another way of bringing these modules into a competition-driven, let's say, learning platform. With that, I would like to conclude my talk, and we are very much indebted with the funding we received from the MathWorks to conduct this research and put the materials together. And along the way, we have enjoyed and have been guided by Neha, Emmanouil, and Melda at the MathWorks, who provided time, feedback, and perspectives in the educational world.
And thank you so much for listening to this recording. And for any questions, please reach out to us. Thank you again. Bye-bye.