How to use a trained model


#1

Hi all,

I am a bit lost in the Universe OpenAI framework.
What i want to do is:

  • train a tensorflow model using the universe-starter-agent (done)
  • load the model and see it playing pong (here i am a bit lost)

First of all i do not know if this is possible. What i am asking is that if there is any tutorial or documentation on how to use the trained models with environment to see how the trained model is interacting with the environment.

Thanks in advance. Marcello.


#2

Hey Marcello,

Did you run the train command with the --visualise flag? it opens vnc windows.

python train.py --num-workers 2 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise


#3

Hi James,

yes i use the --visualize flag.
I am on a mac.
But it does not open any window and if i try to connect to the vnc it tells me that i cannot connect to my own screen.
I connect using open vnc://localhost:5900
If i am using the gym environment i can connect via vnc to the docker machine, but training is very slow.

Thanks.


#4

Do you get any errors? or just no response, no video?


#5

Hi James,

no i do not get neither errors nor warning in the Tmux console.
i launch ti with the following command:
python train.py --num-workers 1 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise
I get the following log in the console:

Executing the following commands:
mkdir -p /tmp/pong
echo /Users/marcelloleida/anaconda/envs/universe-starter-agent/bin/python train.py --num-workers 1 --env-id PongDeterministic-v3 --log-dir /tmp/pong --visualise > /tmp/pong/cmd.sh
tmux kill-session -t a3c
tmux new-session -s a3c -n ps -d bash
tmux new-window -t a3c -n w-0 bash
tmux new-window -t a3c -n tb bash
tmux new-window -t a3c -n htop bash
sleep 1
tmux send-keys -t a3c:ps ‘CUDA_VISIBLE_DEVICES= /Users/marcelloleida/anaconda/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 1 --visualise --job-name ps’ Enter
tmux send-keys -t a3c:w-0 ‘CUDA_VISIBLE_DEVICES= /Users/marcelloleida/anaconda/envs/universe-starter-agent/bin/python worker.py --log-dir /tmp/pong --env-id PongDeterministic-v3 --num-workers 1 --visualise --job-name worker --task 0 --remotes 1’ Enter
tmux send-keys -t a3c:tb ‘tensorboard --logdir /tmp/pong --port 12345’ Enter
tmux send-keys -t a3c:htop htop Enter

no server running on /private/tmp/tmux-501/default
Use tmux attach -t a3c to watch process output
Use tmux kill-session -t a3c to kill the job
Point your browser to http://localhost:12345 to see Tensorboard

and if i go to the Tmux console i see normal logging as example:

[2017-02-24 10:26:33,249] Episode terminating: episode_reward=-19.0 episode_length=1377
[2017-02-24 10:26:33,262] Resetting environment
Episode finished. Sum of rewards: -19. Length: 1377
[2017-02-24 10:26:38,754] Episode terminating: episode_reward=-20.0 episode_length=991
[2017-02-24 10:26:38,767] Resetting environment
Episode finished. Sum of rewards: -20. Length: 991
[2017-02-24 10:26:43,359] Episode terminating: episode_reward=-21.0 episode_length=826
[2017-02-24 10:26:43,374] Resetting environment
Episode finished. Sum of rewards: -21. Length: 826
[2017-02-24 10:26:47,960] Episode terminating: episode_reward=-21.0 episode_length=824
[2017-02-24 10:26:47,974] Resetting environment
Episode finished. Sum of rewards: -21. Length: 824
[2017-02-24 10:26:53,458] Episode terminating: episode_reward=-20.0 episode_length=985
[2017-02-24 10:26:53,473] Resetting environment
Episode finished. Sum of rewards: -20. Length: 985
[2017-02-24 10:26:58,139] Episode terminating: episode_reward=-20.0 episode_length=838
[2017-02-24 10:26:58,152] Resetting environment
Episode finished. Sum of rewards: -20. Length: 838
[2017-02-24 10:27:02,780] Episode terminating: episode_reward=-21.0 episode_length=824
[2017-02-24 10:27:02,795] Resetting environment
Episode finished. Sum of rewards: -21. Length: 824
[2017-02-24 10:27:09,368] Episode terminating: episode_reward=-21.0 episode_length=1182
[2017-02-24 10:27:09,382] Resetting environment
Episode finished. Sum of rewards: -21. Length: 1182
[2017-02-24 10:27:17,370] Episode terminating: episode_reward=-21.0 episode_length=1434

so it seems that there are no particular issues.
I can see in the browser the tensorboard getting correctly updated.
But once i start the process i cannot see any window with the Pong being played and i cannot connect via Vnc (says i cannot connect to my own screen)

Thanks in advance.


#6

I bet its a problem with your vnc client, I thried to look through their commits and found this, https://github.com/openai/universe-starter-agent/pull/24/commits/82bae63a2842be9a19e071cb68ff577ea40beacd

Don’t know if it will help much, but I had to upgrade my vnc driver to view pong.


#7

Hi Marcello,

I am currently studying computing and in my dissertation semester having decided to up the game and learning curve have chosen RL and NN as a subject.

My supervisor recommended a rather good implementation by Andrej Karpathy.

I hope it helps: http://karpathy.github.io/2016/05/31/rl/


#8

Hey Marcello,
I have run into the same problem as you. Were you able to resolve it? I am able to get the VNC version of the Start Agent running. On a Mac using VNC® Viewer 6.0.1 I am able to see the VNC version however I have to connect to port 5901 (localhost:5901).


#9

Hi Ross,
i could not manage to connect to the instance screen directly i only managed to connect if i run the simulation using docker images.
I will try again in the close future and post here if i manage to get it work.
Cheers.