My Approach to the Methodology in Learning To Communicate


Related to: Language from Game Theory

Hey all, I apologize if this isn’t the correct place to post this.

I’m writing because I was independently working on a project which was extremely similar to the linked OpenAI article, Learning to Communicate. Once the article came out I was happy to abandon the project since I assumed the people at OpenAI would have it much better in hand than I would.

On reflection though, I realized that my approach was slightly different from what was described in Learning to Communicate, and I thought I’d share my thinking on the off-chance that the people pursuing that line of research find it helpful.

For the most part, my approach was extremely similar to what was written in Learning to Communicate (in fact, it was really cool to see my own thinking in someone else’s work, as well as seeing the many things they thought of that I hadn’t). The major difference though was how our respective “games” were set up.

In essence, I had based mine around game theory, or at least economics-based exchange. This video clip is a good primer

My world, or at least my initial world, was going to be made up of square cells. Agents, which occupy individual cells, can move about and exchange resources with each other, or can extract different colored resources from different-colored non-moving farms, which also occupy individual cells. Resources are subtracted periodically from agents in exchange for points, and having more diverse resources in your inventory means more points. Agents can also communicate with each other if they are within a certain range.

There are lot more details, but they don’t really matter. The core idea behind the game is to make the rules so that there is benefit to cooperating by economics, but also so that there is benefit to information sharing. In other words, it’s a very simple version of the conditions in which early man would have developed language in the first place.

Here are the potential benefits to this approach.

  • You don’t have to be too creative about what forms of complexity you introduce; just model real life. For example, one of the first developments I would consider making is the ability of agents to commit violence on each other. Some amount of cooperation would still be optimal, but violence might also be optimal in certain situations. I can tweak the parameters of the game to acquire this balance, and thus make an entire range of vocabulary sensible for agents to develop.
  • The language might evolve in such a way so that its vocabulary more resembles our own, since the “game” is modeled after our own world.

I think that’s it. As I said, I thought it was a pretty long shot that anyone would find this useful, but I thought it was worth a go. Anyone can feel free to ask me questions or clarify where my thinking is wrong. I’m by no means knowledgeable on AI, I went into the project thinking it would be a good learning experience.


Hey prudentbot,

I was also thinking of a very similar goal (and also a different approach) for quite some time now. But I didn’t have the skills to do them and still planning to get another college degree to do this stuff since I planned to do this long term. I’m glad I found OpenAI and this discussion. Maybe there are more of us who thought of the same idea but we just didn’t see each other.

I would like to know more about your approach. Can we discuss here or do we need to get in touch somewhere else? I’m wondering how agents get points and resources on the first run. Do they all have an equal amount? I imagine there would have to be a need to trade. Do they run out of points and therefore need to sell resources to get them?

I would also like to share my approach. I came up with it after I discovered Polyworld and Artificial Life Programming. These worlds are largely based on the concept of survival of the fittest. The idea I thought of is similar to what you wrote about introducing violence. Mine is a predator-prey relationship. Agents need to communicate to survive, rather that to get rewards. I thought agents might communicate by sending and receiving plain text (or something) to their surrounding agents. Predator agents can only send a certain character set which is different from the prey agents.

My hypothesis for the prey agents is that when they receive a plain text signal that is typically sent by a predator, they will learn to react accaccordingly. Another hopeful hypothesis is that they may also refer to predators based on the “sound” they make. Like if predators only speak “A” all the time, an overhumanized version of what the prey might say is “oh I saw an ‘A’ we gotta run!”. My hypothesis for the predator agents is that they will learn to be quiet when they hunt.

I have more hypotheses but unfortunately I couldn’t experimentally test them because of my lack of skill.


Hey vjlomosco, thanks for responding!

Yeah, it wouldn’t surprise me at all if there weren’t a lot of people that are thinking along the same lines as us.

Your approach is actually almost exactly mine was when the idea first started forming in my head a few years ago (I had a predator/prey relationship too!). I think I eventually stopped thinking about it like that when I realized you could create cooperative/antagonistic relationships without involving sentient third parties. Like, it definitely makes sense for predators to communicate (check out the second linked video, depicting the Wolfpack game), and it probably makes sense for prey to communicate, but there’s no incentive for predators to communicate with prey, since there’s no room for cooperation in a purely predatory relationship. That said, I would guess that the merits of these different approaches probably boil down to computer resource intensiveness (increased by complications in game rules) vs language complexity (or how the vocabulary that is sensible to develop scales with rule complexity), and I think we’re all just taking shots in the dark when it comes to those things.

As for my own game, I was a bit sparse on the details since I figured it would either be boring or I’d have to touch on stuff that would be subject to change if testing showed it didn’t work. So, sorry about that. Basically agents can take “actions” like moving, exchanging, communicating, or farming and these cost some amount of time. Maybe certain agents can farm certain colors of resources faster than other colors to incentivize exchange like in the Ted Talk I linked, or maybe that isn’t even necessary. The “points” don’t really mean anything in the game world, they’re only used internally to train that agent’s neural network. It would also be possible to do some sort of natural selection process like in Polyworld (or maybe even a combination of both natural selection and straight up neural network training, I’m sure the merits have been discussed by people more knowledgeable than us). The resources are subtracted automatically in exchange for points just to goose the agents into collecting diverse resources.

My version of the actual communication had agents basically just sending numbers to each other, that way you can just assign a number per word. Plaintext is just numbers too, if you think about it :slight_smile:

But anyway, thanks again for responding, and good luck with your degree, my friend. I just finished my CS undergrad. It’s a very cool field.


Hey prudentbot,

For the predator-prey, I thought agents can only send to all surrounding agents, not specific agents. This will be similar to how sound works. A predator can talk to fellows but a nearby prey can hear and run. With that there is both reason to communicate and not communicate. I had another set of hypotheses for that. But without experimental data I cannot tel if it will actually work. And yes its also subject to change.

Your approach is very interesting too. I would very much like to see more research on this. Thanks to reddit, I found OpenAI and one other research

CS is a very nice field indeed. There are so many interesting things you can do that people who are not in the field will not see until has been applied. R&D is especially good. But I’d still need to get a graduate degree to get people to take my research seriously.


What are your hypotheses? Do you have a specific vision for how your system would evolve?


What are your hypotheses?

  • Predators may communicate with each other
    • To find prey
      • Probably after a tweak that allows prey to fight back.
      • Or a tweak that lets two or more predators eat 1 prey and have enough
      • Or a tweak that lets them see further than the range of sound
    • To imitate the “voices” of prey and fool them
      • Probably after adjusting the overlap of their voices
        (e.g. predator voices: ‘A’, ‘B’ :: prey voices ‘B’, ‘C’ => oversimplified)
      • Or setting a cost for imitating them, since they may always speak prey language
  • Preys may communicate with each other
    • To avoid predators
      • Probably after a tweak that lets them see further than the range of sound
    • To find food
      • Probably after a tweak that food needs to be eaten by 2 prey and cannot be eaten by one prey twice
  • Preys will probably learn to keep quiet when predators are nearby
  • Predators will probably keep quiet if prey is nearby
  • Agents will not speak at all
    • I will probably adjust the range of sound and the range of vision to avoid this

Do you have a specific vision for how your system would evolve?

Will those tweaks qualify as evolution? :slight_smile:

It’s possible that all may not happen and something completely different will. Which is why I like research.
This system is probably complex but I also thought about a more simpler approach, similar to the “one other research” I linked above is a speaker-listener relationship in a maze environment. One agent is trained to solve the maze by speaking to a hard-coded listener to follow the speech. Then separately train another agent to do what a hard-coded speaker will speak. Then run a world with both non-hard-coded agents (the actual AI agents trained separately). But I can’t help thinking it will probably work with less problems, and that’s just less exciting.


Just wanted express a quick idea.

Introduce two character types. One that is stronger and can perform all game tasks, but can only think one step ahead, and one that is weaker and cannot perform the game tasks, but can think three steps ahead.

If the two work together they benefit from each other’s strengths. In order to earn cooperation from a weaker character type, the stronger character must amass enough resources to attract a weaker character. Once together, they can work cooperatively. In order to benefit from cooperation, the weaker character must tell the stronger character what to do.

The weaker character gains status points in relation to the amassed resources gained by being teamed with a stronger character. This could also lead to weaker characters separating from their current team to join with a more successful stronger character in order to gain more status points.

Stronger characters gain points in relation to the number of resources they have amassed.

In the end, the dependency upon each other in order to gain resources and status will create endless opportunities for language use.