Understanding OpenAI A2C implementation


I am not a software engineer, but I do work with code on a daily basis. Recently, I learned about the A3C algorithm and wanted to program it to see how it works for myself. Being very new to machine learning, I decided to use no black-box solutions and wrote the algorithm from scratch in mathematica:

From experimenting with it, I understood that my mathematica code is more like a proof of concept, and if I wanted the system to learn more efficiently and make use of the wealth of optimizations developed by the community, I should start using tensorflow to realize the neural network.

OpenAI has published an improved version of the A3C algorithm, which is called A2C:

However, I find this code very hard to read and understand. Most variables are given very short names that are not descriptive. There are literally zero comment lines in this code, so no explanations are provided.

Does OpenAI provide (or plan to provide) detailed pedagogical tutorials on their algorithm code?

Despite the A2C code being not too reader-friendly, I plan to delve into it and try to understand it. May I ask if there is anyone on this forum who understands the code already and might be able and willing to answer some questions about it?


mostly, i’ll help you bump the question.

i’ve also got little clue to all those intrinsic code lines. but that’s the huge challenge of open sourcing code: making it readable to other people, building slow and small blocks that are easy to understand. a specially difficult job in top notch tech.