When I was looking at some of the submitted result of Gym environments. I found many extremely “good training performance” does not even have a “learning-curve”-like learning curve. They must have cheated from personal point of view.
This one is certainly a cheating example:
He trained the model in the first place, and used the trained model to present the “learning curve”.
Even with “learning-curve”-like learning curve, it is not guaranteed that the algorithm is good unless the user provide the raw code. One can easily manipulate the code to get the a good learning curve from a bad algorithm as long as the model could finally be trained to do things well.
We certainly welcome good training algorithms. However, the cheated result from these people are very misleading. Everyone’s time is precious. I wish there could be some rules to prevent these.