Related to: [Name of Request for Research]
I trained a PPO agent on the single-player and multi-player snake problem, the code is at https://github.com/ingkanit/multi-snake-RL
In the single-player setup, I achieved good results after some
hyperparameter tweaking. The multi-player setup proved to be quite
challenging: the main problem was that early exploration often biased
the training data into configurations where one agent was dead and the
network never learned to keep two agents alive. To address this, I tried
restricting the training data and using a small amount of domain knowledge.
Videos and a discussion of my “hyperparameter journey” is at https://deeprljungle.wordpress.com. Let me know if have any ideas for improvement or comparisons with other algorithms!