MSCS Thesis topic help in Deep RL using human feedbacks


I want to do my MSCS thesis in OpenAI, I would really appreciate any suggestions on a topic. I am really interested in deep reinforcement learning using human feedbacks.


Are you looking to create a new utility for OpenAI (like having deep learning demonstrate a specific novel new task) or are you attempting to expand machine learning theory or mathematical models?


I am looking to create a new utility. I would really appreciate any advice on any novel task that I can apply Deep RL using human feedback. I was thinking of music composition that will be inclined towards human preferences. But I haven’t yet worked in audio synthesis like field. So, I am not sure what I would achieve by composing AI music like such would be actually useful or not


If you want specifically want to add to Gym or Universe tools, then requiring humans in the loop might not work well. The environments are set up to run fully automated once started, and typically run for 1000s of iterations to allow reinforcement learning to occur. Potentially if you have examples of subjectively good vs bad music and train a classifier on them, you could automate the scoring process, or maybe even use a GAN to try and keep the output within a human-like distribution.

In general, automated music composition has a long history that pre-dates modern computers. There are a lot of related projects out there, that you could use for inspiration. One example might be Google’s Magenta project.


I’ve seen the music thing done before, but nonetheless, each time someone had an AI make compositions it was very entertaining