DDPG in stochastic environments


Hello everyone,

I have a general question about applicability of DDPG in partially unpredictable environments.

Let’s say I have 5 state variables: V, A, M, S, L. So V, A and M do not depend on the action at all (but are observed by the agent to choose the correct action). They are just given at each step of an episode. The variable S does depend on the action and may be considered deterministic. The variable L is a little bit of both. The action does influence the value of this state variable, however there is also a stochastic component to it.

Now, my question is: Does it make sense to apply DDPG to this kind of an environment?

Additional question: What about non-determinstic rewards, i.e. a transition from one particular state X to a state Y does not always produce the same reward? Is DDPG applicable here?

Thank you in advance!