DQN and Policy Gradient Architectures


I have two questions:

  1. What is the ideal architecture for a classic DQN playing an atari game? By this I mean the hyper parameters of the neural network like number of layers, number of kernel sizes, minibatch sizes, buffer sizes etc etc. Can’t seem to find this information from the deepminds paper!

  2. Why is it that for the policy gradient code used to play PONG that they use a fully connected neural net rather than a CNN? Has anyone tried it with a CNN?