Write an environment for a closed-loop system


Hi guys,

Suppose one trains one of the systems e.g. CartPole-v0 and one generates a policy that makes the pole balance vertically in the middle of the scene. Suppose one wants to define this a new environment that consists of this policy and and the original CartPole-v0 system, what is the right approach for this scenario?


Im not entirely sure what youre asking, but it seems like youre really just wanting to save your Q table (assuming youre using Q learning) , or the weights if youre using something like keras or another NN library.

You could then basically do a check up front, if the saved weights file exists, load that and run, otherwise train from scratch, but youll probably have to have a few modifications so you arent back to high epsilon values and searching a ton, since youve maybe already done that.