Cartpole policy gradient problem


I am doing a part 1 of the request for research.

However, total average reward is still remaining around 10.

I can’t not find a problem. This is reason of why I post this questioon.

Please help me!

Gist Link :


Hi, I have implemented on cartpole using actor critic REINFORCE Technique. It works well. Please check:


Thank you for your help.

I am happening to test the actor-critic algorithm using pong game.
Your sample will help me to understanding actor-critic more specifically.

pong actor-critic link :

From Dohyeong Kim


@srikanthmalla I saw that in both of your actor and critic, you feed and train each episode 100 times, does doing this way help you train the network quicker? What is the reason behind? Your relevant code below.

#Train to update weights
def train(self,states,actions,advantages):
	Trains neural network. Update parameters
	epochs = 100 # question here.
	for _ in range(epochs): # question here
		_,c =[self.optimizer, self.loss], feed_dict={
	    	self.x: states,
		self.actions: actions,
		self.advantages: advantages
	return c

Thanks, Cheng