I stumbled across a Google Deepmind paper called “Continous Control With Deep Reinforcement Learning” (link: https://arxiv.org/abs/1509.02971) proposing a new reinforcement learning algorithm for continous state and action spaces. The algorithm is pretty much summed up on page 5. Is someone familiar with it?
Anyway, I have a couple of questions regarding calculation of the policy gradient ∇J (see p.5):
- Do I understand it correctly, that ∇J is a vector?
- If a is one-dimensional, is ∇Q w.r.t. a is just a partial derivative of Q (∂Q/∂a) and is therefore a scalar?
- If a is polydimensional (and therefore ∇Q w.r.t. a is a vector), how are we supposed to compute the product of two vectors of different length (∇Q * ∇µ)?
I really hope, someone could help me with this. Thank you in advance!
By the way: Does someone know, if someone has already implemented this algorithm in MATLAB? I haven’t found anything yet. Is it even possible to do so?