Interpretation of Pong-v0 Reward and Action


Although I have a working algorithm with Pong-v0, I have difficulty in understanding what the environment is doing exactly:

  • action: the Pong game needs 3 actions: up, down and not moving, but the action space is 6, and action 0, 2, 4 is doing the same thing as action 1, 3, 5, I cannot see why the system should work this way.
  • reward: I use action 0 (not moving) and print out the non-zero reward to the stdout while I observe the game. The nonzero reward output is not synchronized with the game screen, and the total reward is -11 while the score showing on the game screen is 20:0.


I am having the same confusion. Can anyone have the experience help explain? Thanks.