Doubt about env.reset() function


#1

Dear all,

I have been playing around with the code from here. I slightly modified the code to see what exactly the env.reset() returns. Please see the code below

import gym
import universe
import numpy as np

def run_episode(env, parameters):
total_reward = 0
for _ in range(100):
env.render()
observation_n = env.reset()
action = 0 if np.matmul(parameters, observation_n) < 0 else 1
_, reward, done, info = env.step(action)
total_reward += reward
if done:
break
return total_reward

env = gym.make(‘CartPole-v0’)
parameters = np.random.rand(4)2-1
learning_rate = 0.1
best_reward = 0
for _ in range(1000):
new_parameters = parameters + learning_rate
(np.random.rand(4)*2-1)
episode_reward = run_episode(env, new_parameters)
print(“episode_reward %d best reward %d” %(episode_reward, best_reward))
if (best_reward < episode_reward):
best_reward = episode_reward
parameters = new_parameters
if episode_reward == 100:
break

So I take the observation directly from env.reset() rather than observing what an agent’s action does. From my understanding these should be same. But when I run this code I always get a reward of 1.000.

Am I missing something ? Is my understanding of env.reset() wrong ?

Edit: I just realized that env.reset() creates a environment. So every time that I do env.reset() the environment is being created new. And since the pole is in upright position when ever it is created my reward is always 1.000. Is my understanding correct ?


#2

You’re correct. Each environment implements it’s own reset function which you can see in the code for that env: https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L90.