reinforcement learning algorithm that is deployed online on a time varying system
2 views (last 30 days)
Show older comments
Hello,
I asked a similar question https://www.mathworks.com/matlabcentral/answers/822700-reinforcement-learning-lqr-example-question?s_tid=srchtitle but I want to focus on a very specific part.
I am trying to program a reinforcement learning algorithm that has to learn and control a system at the same time. I have one year hourly load data that impact the system, and I want to design an agent that needs to learn and control for that specific year given that data.
My problem is it seems to me the training phase is isolated from the simulation phase, so it's not clear to me how can I make the system advance through time during the training phase, record the state variables, and continue to the deployement phase. So my system is characterized by 8760 matrices each for every year of the hour. I want to make sure that:
1) when I train my agent for each training step that passes it advances to a new hour, basically I want to make sure my system isn't getting to try two different things in the same hour. To achieve that, do I need to make the episode length 1 ?
2) I need to have a history profile of all the actions my agent took during the training time, and the state values. In the training stats I can only see the reward per episode. I can't see the actions my agent took, nor the corrosponding state values.
3) my confusion is the reset function, I don't want the environment to reset the states, it has to continue from the last values. But why am i forced to use a reset function?
I think as long as I have these three things, I should be able to deploy my system to actively learn for x many hours then control for 8760-x many hours. From the way I see in the toolbox, there doesn't seem to be a clear way to do that. Can someone clarify for me?
What I think I need to do is make my states global variables, so I can try to train for one hour only, then advance the system to next time iteration, and also store the action variable to an indexed global variable?
2 Comments
Emmanouil Tzorakoleftherakis
on 13 May 2021
Edited: Emmanouil Tzorakoleftherakis
on 13 May 2021
Looks like you want to do model-based RL which is not supported out of the box right now. I would recommend learning the dynamics and policy in separate training sessions. For learning the dynamics you can use supervised learning since you have data already.
There is also no out of he box way to deploy "learning" yet (we are working on that).
Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!