EDIT: Didn’t realize I had already replied to this post, but oh well, I’ll leave this here, and hopefully not have any reason to reply again.
There is a difference between testing on a trained model and the results of training a model, I trained a model until it achieved the goal of 200 and then re-ran on the trained results (as per a class assignment), and didn’t get it that quickly (still took multiple hundred runs to get there), no this isn’t cheating this is a difference between training a model and testing a model.
As far as cheating by randomly selecting 1000 samples, I would argue that the link you provided is simply another way to solve the problem, instead of acting epsilon greedy, they chose random actions for a while to explore the statespace prior to actually doing any learning, and it’s an exploration vs exploitation choice made by the algorithm designer, but what is really cheating in this context? This isn’t truly a contest, it is simply a metric we can chose to aim for. There is no money or fame as far as I’m aware of, and even if there was, the algorithm would be scrutinized to decide whether it was worthy of fame or not