Shared network for policy and value estimation


Related to: [Name of Request for Research]

Hello !
In reinforcement learning, in particular in Actor Critic settings, we want to use knowledge from a critic that estimates states value to guide the approximation of a policy.

I’m wondering whether it is better to have two separate networks or share the weights and only have a specification in the last layer(s). I’ve seen both ways. What do you think ?

Thanks !