[Reinforcement Learning] Deny an action already taken by the agent

Question

GABRIELE TREGLIA on 7 Dec 2021

0
Link

Direct link to this question

https://au.mathworks.com/matlabcentral/answers/1605225-reinforcement-learning-deny-an-action-already-taken-by-the-agent

Answered: Aditya on 26 Feb 2024

Hi all! I have a problem with the step function in the environment. I wish that an action already taken previously was not chosen,preventing the algorithm from stopping to make a mistake. In this case I am considering the removal of the nodes of a graph of numerosity 8 so the actions would be 8. I am attaching the code. Thanks to the availability :)

classdef FinderEnvi_T < rl.env.MATLABEnvironment
    %MYENVIRONMENT: Template for defining custom environment in MATLAB.    
    
    %% Properties (set properties' attributes accordingly)
    properties
        maxdisconnectN=3;
        mingcc=3;
        N=8;
    end
      properties
        % Initialize system state 
         State = zeros(8,2);
    end
    
    properties(Access = protected)
        % Initialize internal flag to indicate episode termination
        IsDone = false        
    end
    %% Necessary Methods
    methods              
        % Contructor method creates an instance of the environment
        % Change class name and constructor name accordingly
        function this = FinderEnvi_T()
            % Initialize Observation settings
       N=8;
            ObservationInfo = rlNumericSpec([N 2]);
            ObservationInfo.Name = 'Network state';
            ObservationInfo.Description = 'Adj Matrix State';
            
            % Initialize Action settings   
            ActionInfo = rlFiniteSetSpec([1:1:N]);
            ActionInfo.Name = 'Node removal Action';
            
            % The following line implements built-in functions of RL env
            this = this@rl.env.MATLABEnvironment(ObservationInfo,ActionInfo);
            
        end
        
% Reset environment to initial state and output initial observation
        function InitialObservation = reset(this)
%Random graph generation
A = round(rand(this.N));
A = triu(A) + triu(A,1)';
A = A - diag(diag(A));
this.A=A;
%Node degree
[deg,indeg,outdeg]=degrees(A);
%Local clustering coefficient
[C1,C2,C] = clustCoeff(A);
            InitialObservation = [deg' C];
            this.State = InitialObservation;
            
            % (optional) use notifyEnvUpdated to signal that the 
            % environment has been updated (e.g. to update visualization)
            notifyEnvUpdated(this);
        end
        % Apply system dynamics and simulates the environment with the 
        % given action for one step.
        function [Observation,Reward,IsDone,LoggedSignals] = step(this,Action)
            LoggedSignals = [];
%check if the given action is valid
%Here i need help
%Node Disconnection
A=this.A;
this.State(Action,:)=zeros(1,size(this.State,2));
A(Action,:)=zeros(1,length(A));
A(:,Action)=zeros(length(A),1);
this.A=A;
%Giant connecting component
[gcc] = largestcomponent(A);
%Node degree
[deg,indeg,outdeg]=degrees(A);
%Local clustering coefficient
[C1,C2,C] = clustCoeff(A);
% Update system states
    Observation=[deg' C];
    Observation=this.State;
 % Check terminal condition
    disconnectedN= sum(Observation(:,1)==0); %conta quanti ND 0 ci sono nel vettore deg 
    IsDone =disconnectedN >= this.maxdisconnectN || gcc<=this.mingcc;
    this.IsDone = IsDone;
     % Get reward
    Reward = 1/length(this.A) * gcc/length(this.A);
        % (optional) Visualization method
        function plot(this)
            % Initiate the visualization
      
            % Update the visualization
            envUpdatedCallback(this)
        end
        
        function envUpdatedCallback(this)
        end
    end
    end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Aditya on 26 Feb 2024

0
Link

Direct link to this answer

https://au.mathworks.com/matlabcentral/answers/1605225-reinforcement-learning-deny-an-action-already-taken-by-the-agent#answer_1416433

Open in MATLAB Online

It seems you want to prevent the reinforcement learning (RL) agent from taking an action that has already been taken in the current episode, which in your case is removing a node that has already been removed from the graph. To achieve this, you will need to modify your environment to keep track of the actions taken and to provide a signal or a penalty to the agent when it attempts to take an invalid action.

Here's a modified version of the step function that includes a check for whether the node has already been removed. If the node has already been removed, it provides a negative reward and sets the IsDone flag to true to end the episode.

function [Observation, Reward, IsDone, LoggedSignals] = step(this, Action)
    LoggedSignals = [];
    % Check if the given action is valid 
    if this.State(Action,1) == 0
        % The node has already been removed, give a large negative reward and end the episode
        Reward = -1;
        IsDone = true;
        Observation = this.State;
    else
        % Node Disconnection
        A = this.A;
        this.State(Action,:) = zeros(1, size(this.State, 2));
        A(Action,:) = zeros(1, length(A));
        A(:,Action) = zeros(length(A), 1);
        this.A = A;
        % Giant connected component
        [gcc] = largestcomponent(A);
        % Node degree
        [deg, indeg, outdeg] = degrees(A);
        % Local clustering coefficient
        [C1, C2, C] = clustCoeff(A);
        % Update system states
        Observation = [deg' C];
        this.State = Observation;
        % Check terminal condition
        disconnectedN = sum(Observation(:,1) == 0); % Count how many nodes have degree 0
        IsDone = disconnectedN >= this.maxdisconnectN || gcc <= this.mingcc;
        this.IsDone = IsDone;
        % Get reward
        Reward = 1 / length(this.A) * gcc / length(this.A);
    end
    % environment has been updated (e.g. to update visualization)
    notifyEnvUpdated(this);
end

In this modification, before proceeding with removing a node, the step function checks if the node has already been removed by looking at the State property. If the degree of the node (first column in State) is zero, it means the node has already been removed, and the function provides a negative reward and ends the episode.

Please note that ending the episode might not be the best strategy for training an RL agent, as it could lead to the agent learning to avoid the penalty by not taking any action at all. A better approach might be to provide a negative reward but allow the episode to continue, or to implement a masking mechanism that only presents the agent with valid actions at each step.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

[Reinforcement Learning] Deny an action already taken by the agent

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

[Reinforcement Learning] Deny an action already taken by the agent

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments