Understanding DDPG (and implementation in MATLAB)


#1

Hello everyone!

I stumbled across a Google Deepmind paper called “Continous Control With Deep Reinforcement Learning” (link: https://arxiv.org/abs/1509.02971) proposing a new reinforcement learning algorithm for continous state and action spaces. The algorithm is pretty much summed up on page 5. Is someone familiar with it?

Anyway, I have a couple of questions regarding calculation of the policy gradient ∇J (see p.5):

  1. Do I understand it correctly, that ∇J is a vector?
  2. If a is one-dimensional, is ∇Q w.r.t. a is just a partial derivative of Q (∂Q/∂a) and is therefore a scalar?
  3. If a is polydimensional (and therefore ∇Q w.r.t. a is a vector), how are we supposed to compute the product of two vectors of different length (∇Q * ∇µ)?

I really hope, someone could help me with this. Thank you in advance!

By the way: Does someone know, if someone has already implemented this algorithm in MATLAB? I haven’t found anything yet. Is it even possible to do so?

Thank you!


#2

Hi. I took a look at the paper, and can offer a few thoughts that might help.

  1. I believe that yes: ∇J is a vector. It has as many components as there are parameters (weights) in the so-called actor network.

  2. In the first part of the paper, a is said to be N-dimensional (page 2). The way I was thinking about a is that each component of a applies to a separate joint (a has 7 components in the case of the 7-DOF arms for example), each of which is a continuous real number (which might represent a torque applied at a given joint).

  3. If you look at equation 6 on page 3, it might help with this question. To compute the ∇J, they are taking the gradient of the Q network with respect to a, and the gradient of the μ (actor) network with respect to the parameters of the μ network. According to the paper by Silver (http://proceedings.mlr.press/v32/silver14.pdf - See equation 7), the gradient of μ is actually a matrix (rows=number of parameters, columns=number of action dimensions). Therefore, it appears you have a matrix product (one matrix and one vector).

Also, it looks like OpenAI has created a Python implementation: https://github.com/openai/baselines/tree/master/baselines/ddpg
Not Matlab, but perhaps relatively straightforward to re-implement.

I hope this helps.


#3

Hey,

thanks mate! That really was helpful :slight_smile:


#4

Great. Thanks. :heavy_check_mark: