About the expected reward in Markov processes


#1

Hi everyone,

I am new in RL and it’s my first question here. I just need some clarifications about the expected reward for state-action-next state. As defined in the book of Sutton it is:

My question is simply, why should we divide by p(s’|s,a)?

All your explanations are welcome!