I trained the benchmark A2C gym algorithm on breakout. This is converting the game to deepmind format where the output score is a zero or one. what is the proper way load this trained model and play the game to see if the scores match the benchmark?
I tried the following but the agent is not performing well. I think in training it was doing better so I suspect something is wrong perhaps grayscale or number of frames beings skipped or something similar. Any suggestions or reference code would be appreciated.
def play_episode(env_name, model, seed): env = gym.make(env_name) env.seed = seed env = wrap_deepmind(env, frame_stack=True, scale=True) obs, states, done = env.reset(), None, False episode_rew = 0 while True: while not done: env.render() obs = np.reshape(obs, (1,84,84,4)) action, value, states, _ = model.step(obs, states, done)# states used for lstm model only obs, rew, done, _ = env.step(action) episode_rew += rew env.close print("Episode reward", episode_rew) return episode_rew