Not sure if a good match to the problem. Definitely it would be interesting to see RL techniques used to train a chatbot.
Universe is more about wrapping up the environment for easy interaction. If you wanted to write a chatbot that could operate the front end of a chat window like a human operator does, it could be useful there. However, most chatbots are designed assuming you can integrate direct with the text stream. In which case universe (or Gym) will not do much for you.
The big problem to train a chatbot using RL or other optimisation techniques is getting a reward signal. You want to have something that can assess the output and provide a signal “yes that was an appropriate response” etc. Universe can do that at a technical level, but only by wiring up something that already assesses or responds to the chat messages. There isn’t really such a thing (if there was, it would already be close to being a chatbot capable of passing the Turing test).
Probably if you want a self-learning chatbot, you would assess it using something more like a GAN, with a discriminator trying to detect the difference between generated conversations and real ones.