DQN training data calculation


Hi everyone!
I’ve implemented my data extraction method below from transition tuple for DQN, do you think it is correct? Or I am doing something wrong. It is my first ever implementation of DQN (I’ve used the deepmind’s pseudocode). Feel free to comment you thoughts.

def processBatch(data_arr, df, DQN, DQN_target):
    X, Y = [], []
    for elem in data_arr:
        s1, action, reward, done, s2 = elem.getValues()
        y = DQN.predict(s1)
        y[action] = reward
        if not done:
            y[action] = df * max(DQN_target.predict(s2))
    return np.array(X), np.array(Y)


Looks good, however

y[action] = df * max(DQN_target.predict(s2))

should be

y[action] = reward + df * max(DQN_target.predict(s2))


y[action] += df * max(DQN_target.predict(s2))


Thanks, actually thats the first bug I’ve noticed the morning after I’ve written my code :smiley: