How to let the reinforcement learning agent know exactly what action it takes?

50 views (last 30 days)
Dear Matlab Experts,
I am currently running a reinforcement learning simulation, integrated with a discrete events system of simulink. My main simulation of the discrete events utilizes bus element containing multiple entites that some will serve as an observation for the RL agent (via conversion entity -> signal) and to impose the action the RL agent chooses (via conversion signal -> entity). I imposed some policy in the DES where given a certain requirements, the entity value will be assigned, that will switch an entity gate to determine which course of action to take. However, my reinforcement learning agent does not seem to understand this rule, as it assigns the entity value randomly from the values available. Is there a way to apply this rule that is present in the DES, to somehow make the same rule understandable by the RL agent?
Thank you so much in advance! I am attaching my model for reference.
Best regards,
Aaron.

Answers (1)

Maneet Kaur Bagga
Maneet Kaur Bagga 17 minutes ago
Hi,
As per my understanding, the issue is encountered because your DES contains specific policies, such as switching gates based on entity attributes. These rules are likely hard-coded and not inherently part of the RL environment's observation or reward structure. The RL agent explores actions based on the provided observations and the learned policy.
Please refer to the following workaround for the same:
Incorporate Rule into Observations: Add flags or variables that indicate the rule's state (e.g "Gate should switch" = 1/0). Ensure these conditions are dynamically updated during simulation.
Augment Reward Structure: Add a penalty or reward for actions that align with or violate the DES rules. This encourages the RL agent to learn behaviors aligned with the rules.
reward = reward + (agentAction == expectedAction) * rewardFactor;
Pretrain the Agent: Use supervised learning to pretrain the RL agent to follow the DES rules as a baseline policy. Later, fine-tune with reinforcement learning.
Custom Environment Dynamics: Modify the environment (DES model) such that the DES rules are enforced during interaction. For instance, override the agent’s selected action if it violates a rule.
if violatesRule(action, currentState)
action = enforceRule(currentState);
end
Regularization: Include constraints in the training process that mimic the DES rules. For example, ensure that the policy network outputs actions adhering to the rules.
loss = loss + ruleViolationPenalty * countViolations(actions, state);
Rule-Based Hybrid Approach: Use "rlAgent.getAction" to test the agent's action in specific scenarios and compare it against the DES policy to identify mismatches.
Please refer to the following MathWorks documentation of "rlAcAgent.getAction" for better understanding:
Hope this helps!

Categories

Find more on Discrete-Event Simulation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!