Input layer for snake game


#1

Related to: [Reinforcement Learning]

Hi !
I have a question concerning the input layer. If your inputs are liable to vary in size, for instance in the case of snake, if you want to keep track of all the snake’s body part, how would you structure your input layer ?
I think we can find the same kind of problem in the board games environments. In chess, for instance, when you lose some piece, do you have to modify the output layer structure ?

Thanks a lot !


#2

Try and keep the structure of the input representation stable*. That may mean over-specifying some things, lots of sparse inputs, or some compromise.

For snake, assuming everything is on a single screen, then use pixel values, perhaps downsampled, and perhaps more than one frame, in order to represent motion. That is how the DQN Atari games player was done. You could augment this with metadata about the snake if you have that available and want the agent to use it. Performance and/or training time could be better if you supply metadata instead of wait for the agent to learn everything from the pixels.

* At least for input into any machine-learning function approximator, such as a neural network.


#3

Thanks for your answer ! Indeed, CNN seems to be the way to go for input representation, ie: learning from raw pixels. However, how would you then setup the output layer ? I mean, in snake, the actions are not liable to change (up, down, left, right) but what if you want your agent to play chess ? Given which part of your army you have left, they are actions that do not even exist. How would you manage then ?

Thanks !


#4

Designing good representations for both input and output is a bit of an art form.

I think that the Atari platform was chosen quite deliberately to demonstrate general learning approach, because all the games had a common controller. Actions in DQN Atari paper consist of choices of which button to press on the controller. It is the same for all games, so the same output works for all of them. And thus it was possible to run the exact same agent against every game.

In general in reinforcement learning, you have 2 broad choices on how to represent an action:

  • As a representation of the desired state

  • As some vector that can fully represent all possible choices that could be made during the game

Which one has the most merit depends on how easy it is to code the representation plus how easy it would be for a learning agent to handle it. Simple action spaces where the actions are finite and be enumerated consistently can in theory all share the same general output model, such as a softmax layer. A lot of the teaching literature for RL uses these kind of action spaces, because dealing with the complexities of representation is a distraction from learning about the RL techniques.

In environments like classic board games you have two ways you can cope with the “over-expression” this leads to (i.e. where you can specify large numbers of illegal moves due to how you have encoded the action):

  • With action choice expressed in the output, you can filter and re-normalise action choices based on what the environment allows. It doesn’t matter if the neural network assigns a probability to an illegal action, you just ignore that at the point of action selection. The NN will never learn about the action in the situation where it is not allowed, so you have to continue to ignore it being selected.

  • With action choice expressed in the input, you can assess the value of any state/action pair and iterate through all options. There is a nice extension to this that works very well for deterministic board games like chess, go or tic tac toe - which is when you notice that the input state has no bearing on the value of the move, all that matters is the state after you move. This is called the afterstate representation, and it is more efficient way to learn those games. Note this is limited to value-based methods such as Q-learning, SARSA or Monte Carlo Control. Policy-based methods must express an action in the output.

How to decide or choose the representations? Right now, you need domain knowledge of the problem to solve, plus a dash of creativity and technical analysis. There is no general solution, and no general way to create a learning agent that can figure out its own representations.

Probably a more general problem AI would be fully embodied in the real world, using senses such as vision and hearing, and be optimised around those, and for outputs would have a physical body to control. But writing a chess-playing bot that works from a video feed of the chess game and plays like the original Mechanical Turk adds a large amount of non-necessary layers and computation. Most agents remain narrowly defined and the developer to understand and create the right specific interfaces. Embodied agents that learn to solve problems more generically are in development but are at the stage of navigating simple obstacles or stacking blocks, they won’t be teaching themselves chess by having the rules read to them and trying a few games, for a few years yet.


#5

There are a number of different things you could do. For example, you could maybe have two grids of squares to the front of the snake (or anywhere near the snake) that would turn and move with the snake. One grid might contain ones and zeros corresponding to whether in that square is a part of the snake (or beyond the bounds of the room the snake moves in). The other grid might have each square contain ones or zeros corresponding to the presence of an apple or whatever in a given square. The maximum performance would be kinda limited, but depending on the size of the input fields, i don’t think it would be too difficult to train. The outputs could be left or right.

Output exceeding a certain amount in the left or right output would result in the snake turning left or right. Conflicts would be resolved somehow For example: either the strongest signal wins out, or the snake would go neither left or right - I think this would affect the eventual decisiveness of the network. When presented with a wall, the snake would be punished for choosing to go both left and right, since it would just keep going into the wall.

I think there are a lot of ways you could implement the training. I’m just putting an idea out there.