Why don't neural networks make use of temporal difference?


#1

So, I was reading these two articles.

The first one makes use of temporal difference in a q- table.

The second one just implements the regular bellman equation without making use of the temporal difference algorithm. This is done in the context of a neural network.

Am I missing something here? I thought temporal difference is very important. What’s going in the context of neural networks?


#2

The Bellman equation is essentially the same equation as the temporal difference equation used in TD learning. There are a few variations of it, depending on what you know about the environment.

Q Learning is a form of TD learning. Specifically, it uses action values (as opposed to state values), sampling (as opposed to directly querying a model of the environment), it is off-policy (learning an optimal policy from observing non-optimal behaviour), and in the basic form it is single-step. Those 4 traits affect how the calculations for TD target or TD error are done.

The big difference when jumping from tabular to neural net variants is dropping the Q table in favour of a Q estimation function. However, these both represent the same thing to the agent - the agent’s current best guess of the expected return given a current state and action. And they both receive very similar updates due to experience.