Gym: Pendulum-v0 not solvable by vanilla policy gradient ? increase max torques?


#1

The original max torques is +/- 2 with max speed +/- 8, according to some solutions, it needs to swing several times to balance upward. I guess it is not solvable by vanilla policy gradient with 1 layer MLP with 50 neurons. What might be good values for max torques and max speed such that the pendulum needs to swing only once or twice to balance upward ?