Hi, every one.
I’m implementing basic double DQN on Cartpole_v0. There is one hidden layer with 64 neurons and when training, I got the following accumulated rewards-episode curve.
At the beginning, the curve is going up and every thing is fine. After the accumulated reward reaches its maximum which is 200, it appears to drop occasionally, which can be seen in the figure. Some droppings are very severe. I tried to stop the training process when the curve was at the top and it turned out to be perfect.
Is there any idea why that dropping point appears even thought the reward is high? Or is it because I have been training for too long?