How to find the best 100 episode average manually?


#1

When I upload my results, I get the best 100 episode average which I want to calculate manually. How do I go about this ?


How do people know about class objects of each environment?
#2

There are really two ways to handle this, first is to sum of the averages for an episode, then average those. What I did was to use the environment function env.get_episode_rewards(), I then take the average of that using array slicing and numpy average.

np.array(env.get_episode_rewards())[-100:].mean()


#3

Wait, I don’t quite get that. I am new to openai. How did you know about these inbuilt functions ? And can you please explain the process in simpler terms ? Thanks in advance


#4

In python you can use the built in python function dir to figure out what
functions are available to a variable, so print dir(evn) gives the list, I
figured it out from there.

Second using array slicing -100: means take the last 100 elements of the
array and return it.

Mean is well, average.

Since usually they use the average of the last 100 episodes thats why we
check for 100 elements.

Make sense? Lemme know if youre still confused

Tyson


#5

Also sorry one more followup.

As “painful” as it can be, you can go searching through the repository for things that you’re looking for.

For example there is a “local scoreboard” you can run when your’e done basically browse around look for stuff, In some ways the documentation is lacking, but it’s hard to organize these kinds of topics in a coherent manner so I’m not surprised they’re not obvious,

from gym.scoreboard import scoring
results = scoring.score_from_local(tdir)

#6

Okay dir is something new I’ve just learned.
So wait, they only take that last 100 episodes into consideration 0.o?


#7

How do I go about “searching” which repository? How and where do I find all this stuff ? Thank for the reply BTW.


#8

No problem!

If you look here there is a “search this repository”, you can do some searching from there. That’s one way to search for things (it’s laborious, but its an option).

I’ll be honest I have only played with openai Gym for about 2 weeks through the process of a project in a GA-Tech class I’m taking. I just realized I needed x,y,z and started looking, and found them.

If you don’t want to include video in your environment but still be able to upload/etc, just set video_callable = false in the env monitor wrapper.


#9

Ok so if you look here for example:

It says at the end of the description,

LunarLander-v2 defines “solving” as getting average reward of 200 over 100 consecutive trials.

Thus if you take the last 100 trials and then average that number you should get your “average” over 100 trials, since you’re doing -100 it will always grab the last 100, every episode you run, so in the early stages you don’t actually have 100 episodes, so it might spit out a warning but I wouldn’t worry about it.

I was using it as termination decision, so stop running episodes when I hit a 200 average, but you gotta make sure that you run the minimum 200, so if you’re “re-training” (meaning you already have good weights) make sure you run 100 runs before letting that termination occur.

eg.

if (above average) > 200 and episode_count > 100:
 break

#10

Here is code to find best 100 episode average:

ma = np.convolve(rewards, np.ones((100,))/100, mode='valid')
print('{}: Maximum 100 Episode Average = {:.3g} around Episode {}'
      .format(name, np.max(ma), np.argmax(ma))

#11

Can you please explain how this piece of code works?


#12

This “slides” a vector of 100 1s along your data vector. Performs element-wise multiplication at each of those 100 spots and sums the result (i.e. convolution). To get the average of this sliding window we divide by 100.