Designing good representations for both input and output is a bit of an art form.
I think that the Atari platform was chosen quite deliberately to demonstrate general learning approach, because all the games had a common controller. Actions in DQN Atari paper consist of choices of which button to press on the controller. It is the same for all games, so the same output works for all of them. And thus it was possible to run the exact same agent against every game.
In general in reinforcement learning, you have 2 broad choices on how to represent an action:
Which one has the most merit depends on how easy it is to code the representation plus how easy it would be for a learning agent to handle it. Simple action spaces where the actions are finite and be enumerated consistently can in theory all share the same general output model, such as a softmax layer. A lot of the teaching literature for RL uses these kind of action spaces, because dealing with the complexities of representation is a distraction from learning about the RL techniques.
In environments like classic board games you have two ways you can cope with the “over-expression” this leads to (i.e. where you can specify large numbers of illegal moves due to how you have encoded the action):
With action choice expressed in the output, you can filter and re-normalise action choices based on what the environment allows. It doesn’t matter if the neural network assigns a probability to an illegal action, you just ignore that at the point of action selection. The NN will never learn about the action in the situation where it is not allowed, so you have to continue to ignore it being selected.
With action choice expressed in the input, you can assess the value of any state/action pair and iterate through all options. There is a nice extension to this that works very well for deterministic board games like chess, go or tic tac toe - which is when you notice that the input state has no bearing on the value of the move, all that matters is the state after you move. This is called the afterstate representation, and it is more efficient way to learn those games. Note this is limited to value-based methods such as Q-learning, SARSA or Monte Carlo Control. Policy-based methods must express an action in the output.
How to decide or choose the representations? Right now, you need domain knowledge of the problem to solve, plus a dash of creativity and technical analysis. There is no general solution, and no general way to create a learning agent that can figure out its own representations.
Probably a more general problem AI would be fully embodied in the real world, using senses such as vision and hearing, and be optimised around those, and for outputs would have a physical body to control. But writing a chess-playing bot that works from a video feed of the chess game and plays like the original Mechanical Turk adds a large amount of non-necessary layers and computation. Most agents remain narrowly defined and the developer to understand and create the right specific interfaces. Embodied agents that learn to solve problems more generically are in development but are at the stage of navigating simple obstacles or stacking blocks, they won’t be teaching themselves chess by having the rules read to them and trying a few games, for a few years yet.