Can I use universe to record human demonstration and apply behavior clone before RL?


The blog post says

We can use human performance as a meaningful baseline, and record human demonstrations by simply saving VNC traffic. We’ve found demonstrations to be extremely useful in initializing agents with sensible policies with behavioral cloning (i.e. use supervised learning to mimic what the human does), before switching to RL to optimize for the given reward function.

So, can I use universe to record my demonstration now?
If yes, how can I do this?
And how can I use the records?
Is there any documentation on this?


We haven’t released the code for this, but will in upcoming weeks. If you’d like early access, please sign up here:


Does anyone know of any papers exploring this in a recent context? Everything I’m finding is either very theoretical or untested against recently developed algorithms like DQN.


Could you give some examples of these papers (even if they are not 100% related to this question)? I would be very interested in reading them. Thanks!


Here are my notes/references so far:

Cooperative Inverse Reinforcement Learning

  • good source of other references
  • human and robot share a reward function
  • human knows reward function
  • robot does not know reward function

good description of above:

feature engineered version:

  • goal is motion through space of robot arm
  • motion is defined by a linear set of points in state-space
  • clusters of points in state-space by human action is used to guide policy
  • tasks: scooping and pouring of coffee beans, placing item
    • quality of completed task is measured by weight of scooped or poured beans
Robot Learning From Demonstration

pendulum swing up task

  • uses video of human hand to assist in learning action to take by robot arm
  • many issues stemming from the physical differences between the robot arm
    and human arm
    • can’t replicate motion exactly
    • human deviated from 2d plane assumed in robot case
    • different grip

high level overview, pretty good, I think, something about it unsatisfying
though. only motor control tasks, no games

cart pole robot arm:

Inverse Reinforcement Learning (slides)
high level overview, good foundations


Any ETA on when the code is gonna be available on how to train agents from recorded human demos?


Any leads on this?