Multi action agent programming in reinforcement learning

Question

Nabil Jalil Aklo on 11 Jul 2020

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/563150-multi-action-agent-programming-in-reinforcement-learning

Commented: Nabil Jalil Aklo on 14 Jul 2020

Please, how can I program or represent multi action agent in reinforcement learning (DQN), where I could construct the agent but I do not know how can represent it (action with three decision every stage of learning) in step function. The action has three decision that are charging battery, operating first generator and operating second generator. The first part of code below show how I construct the enviroment and in the second part I ask how can I add this actions to the my step function.

Thank you in advance.

first part

clc

ObservationInfo = rlNumericSpec([4 1]);

ObservationInfo.Name = 'EnergSolar States';

ObservationInfo.Description = 'T,SOC,SOF,Temp';

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

ActionInfo.Name = 'EnergSolar Action';

env = rlFunctionEnv(ObservationInfo,ActionInfo,'myStepFunctionfuel','myResetFunctionfuel');

obsInfo = getObservationInfo(env);

numObservations = obsInfo.Dimension(1);

actInfo = getActionInfo(env);

statePath = [

imageInputLayer([4 1 1], 'Normalization', 'none', 'Name', 'state')

fullyConnectedLayer(200, 'Name', 'CriticStateFC1')

reluLayer('Name', 'CriticRelu1')

fullyConnectedLayer(200, 'Name', 'CriticStateFC2')];

actionPath = [

imageInputLayer([1 3 1], 'Normalization', 'none', 'Name', 'action')

fullyConnectedLayer(200, 'Name', 'CriticActionFC1')];

commonPath = [

additionLayer(2,'Name', 'add')

reluLayer('Name','CriticCommonRelu')

fullyConnectedLayer(1, 'Name', 'output')];

criticNetwork = layerGraph(statePath);

criticNetwork = addLayers(criticNetwork, actionPath);

criticNetwork = addLayers(criticNetwork, commonPath);

criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');

criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

criticOpts = rlRepresentationOptions('LearnRate',0.002,'GradientThreshold',1);

critic = rlRepresentation(criticNetwork,obsInfo,actInfo,...

'Observation',{'state'},'Action',{'action'},criticOpts);

agentOpts = rlDQNAgentOptions(...

'UseDoubleDQN',false, ...

'TargetUpdateMethod',"periodic", ...

'TargetUpdateFrequency',4, ...

'ExperienceBufferLength',100000, ...

'DiscountFactor',0.99, ...

'MiniBatchSize',1000);%500 to 1000

agent = rlDQNAgent(critic,agentOpts);

trainOpts = rlTrainingOptions(...

'MaxEpisodes', 1000, ...

'MaxStepsPerEpisode', 500, ...

'Verbose', false, ...

'Plots','training-progress',...

'StopTrainingCriteria','EpisodeReward',...

'StopTrainingValue',0,...

'ScoreAveragingWindowLength',5);

trainingStats = train(agent,env,trainOpts);

Second part

%Balance eq.

Pg=PL-Ppv-bpr*(Action1);

if(Pg>Z)

if(Pg-Z<=150)

PDG1=Pg(T)-Z;

PDG2=0;

F(T)=A*PDG1+B*Pr;

Pg=Z;

else

if(Pg-Z<350)

PDG2=Pg-Z;

F=A*PDG2+B*Pr2;

PDG1=0;

Pg=Z;

elseif(Pg-Z<500)

PDG2=350;

PDG1=(Pg-Z-PDG2)*Action2;

F=A*(PDG1+PDG2)+B*(Pr1*Action2+Pr2*Action3);

Pg=Pg-Z-PDG1-PDG2;

end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Emmanouil Tzorakoleftherakis on 13 Jul 2020

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/563150-multi-action-agent-programming-in-reinforcement-learning#answer_465320

This example shows how to create an environment with multiple discrete actions. Hope that helps

3 Comments
Show 1 older commentHide 1 older comment

Emmanouil Tzorakoleftherakis on 14 Jul 2020

All the elements are in ActionInfo.Elements. Is that what you need?

Nabil Jalil Aklo on 14 Jul 2020

Let me explain what I need in this example:

If I have action vector consist of three elements at time,

ActionInfo = rlFiniteSetSpec({[-1 0 0],[-1 1 0],[-1 0 1],[-1 1 1],[0 0 0],[0 1 0],[0 0 1],[0 1 1],[1 0 0],[1 1 0],[1 0 1],[1 1 1]});

At any time, let the action vector became Action=[-1 0 1] these element represent three decisions to control battery charging, first generator control and second generator control, at mean time I want to apply the first element of this vector on the equation below

SOC=SOC+200*(first element of the action vector)

the question is how can I abstruct the first element from the vector.

Thank you in advance.

Sign in to comment.

Multi action agent programming in reinforcement learning

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

Multi action agent programming in reinforcement learning

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Products

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment