Training a neural network in a real world system (inverted pendulum)
31 views (last 30 days)
Show older comments
Elvin Toh
on 13 Nov 2020
Commented: Elvin Toh
on 16 Nov 2020
Hi,
I have physically built an inverted pendulum system driven by a DC brushed-motor, controlled by an arduino, programmed using simulink blocks (arduino package for simulink). Encoder functions are done via S-function builder and provided to my blocks.
For the past 2 weeks, I have been able to successfully control my inverted pendulum using a simple PID controller, all implemented within simulink and deployed on my arduino mega.
But I would like to up the game and use machine learning to learn to control my inverted pendulum system (not PID). I do NOT have a simscape model, nor state-space representations of my inverted pendulum system nor do I intend to model it. I would like to use any of the reinforcement learning methods from matlab/simulink to physically learn the system in real-time (I don't mind if it takes days to physically run 1000s of actual runs) (as if it is a black-box system)
I'm currently using the DDPG agent and have been able to validateEnvironment without any errors (the RL agent block gets the 'observations', ' rewards', 'isdone' from my 'environment', and sends an 'action' to my environment).
The problem I am facing now is that when I start training, my dc motor will indeed move/fluctuate according to the ' actions' sent to the PWM pin of my arduino (again, via simulink blocks). But my S-function encoder block does not return any values (it constantly remains as 0) at all. I have used rate transition blocks, set different sampling time, but it just continues to output 0 to whenever I use the scope/display to monitor my encoder block output. And because it constantly produces 0, my reinforcement learning episodes gains no meaningful training at all.
To proof that the encoder block is indeed working, I even deleted the RL agent block and I could see proper values from my encoder block.
Can someone tell me if it is indeed possible to use/train a neural network based on a real time(or near to real time, I don't even mind if it is lagging) inputs/outputs to an arduino mega connected to an inverted pendulum system please?
0 Comments
Accepted Answer
Emmanouil Tzorakoleftherakis
on 16 Nov 2020
Hello,
I am not sure why you get zero values consistently, but since you say this works when you don't have the agent block in your model, this looks like an issue when using bidirectional communication. Try replacing the RLAgent block with, e.g. a constant block or MATLAB Function block that published actions at the same rate. If you still see zero values, then check your S-function implementation.
On a different note, it is certainly possible to use RL and train with a physical system, but that requires some extra work and attention. For example, how do you define an episode in a real physical system? Will you reset the system yourself after each episode? Controlling motors with trial & error methods such as RL might create issues and so on.
Most importantly, it seems like your workflow is as follows: policy lives in your Simulink model, sends actions to the system, reads observations from the board. This will most likely not work with a system that requires fast, real-time control like the pendulum. Your policy will always lag, so will observations so you will likely never get close to the desired equilibrium to get the "good rewards" (particularly more so if you are trying to swing up the pendulum).
The best approach is to have the policy deployed on your board and have the training algorithm in your Simulink model update the parameters of the deployed policy periodically. Unfortunately this is not currently supported mainly because you cannot deployed neural nets created with Deep Learning Toolbox layers on your Arduino (we are working on it). The workaround would be to recreate the neural network with core Simulink blocks (I am assuming your neural net for this problem is not very complex consisting of primarily fully connected layers and activations) and deploy it like this to your board. That way, the deployed policy will be able to control the pendulum in real time. Then you can adjust the FC layer weights periodically.
Hope that helps
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!