Hi! I’m using the OpenAI Baselines code to try to solve a mathematical problem, for which I built an environment. In many states there are only few actions which are legal. I am aware that for the illegal actions I can return the same state and a zero reward, thus basically ignoring them. Can anyone help me tweak the OpenAI Baselines code to query the environment on whether the action is legal, before initializing the Q-weights for it? Is this even possible/desirable, or is ignoring the illegal moves a better idea?
Another question is: when using the MLP net from OpenAI Baselines, is it possible to somehow simply plot what the network has learned in a way that allows me to check whether it is going in a desirable direction?