Trouble understanding REINFORCE (pros help!)


After countlessly looking at the function of REINFORCE, I still dont understand how those terms got in there. E.g why the sum of the reward trajectories multiplied by their probabilities are giving us the cost J?

Please explain how it works if you understand it.