Related to: [Name of Request for Research]
In reinforcement learning, in particular in Actor Critic settings, we want to use knowledge from a critic that estimates states value to guide the approximation of a policy.
I’m wondering whether it is better to have two separate networks or share the weights and only have a specification in the last layer(s). I’ve seen both ways. What do you think ?