- Why?
- In order to gain an understanding of more complex RL algorithms working with the FlexSim model, which includes state machine logic, multi-agent passenger management, and complex observation spaces
- How?
- Read and understood how the learning algorithm interfaced with the FlexSim via the state machine diagrams
- Read about how the system uses action masking (A vector to store decisions) to output decisions to act upon
- Ran the corresponding Python files to see it in action (The environment, the training, and the inference based on the trained model)
- Learned about command tensorboard --logdir=. in order to see the Tensorboard interface, and model results without using the command palette interface
- Notice the difference in performance of the FlexSim model with the trained RL model making decisions rather than taking random decisions
- Importance?
- Completing and understanding this allows for the bridging with previous more simpler concepts (From HelloWorld) to more complex agent models to understand in the future



