CartPole-v0 agent gets really good at rolling out of the game?


#1

I made a policy-based agent that plays CartPole-v0.

I though an agent could easily master the game and have an episode that lasts an infinite amount of time.

To test that out on mine, I registered a new environment to remove the “winning” limitations:

gym.envs.register(
    id='CartPoleMoreSteps-v0',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=None,
    reward_threshold=None
)

My agent is just a simple dense 2 layers network that updates after every runs using discounted rewards:

This agent gets quite good after about 15 minutes of training it gets an average reward per episode of 30000 using a discount factor of 0.99.

However, after a whole night of training it only gets about 5700 of average reward per episode and all it does is get out of the level fast.

CartPole-v0

Any idea how I did incentive that behavior?