Why don't neural networks make use of temporal difference?


So, I was reading these two articles.

The first one makes use of temporal difference in a q- table.

The second one just implements the regular bellman equation without making use of the temporal difference algorithm. This is done in the context of a neural network.

Am I missing something here? I thought temporal difference is very important. What’s going in the context of neural networks?


The Bellman equation is essentially the same equation as the temporal difference equation used in TD learning. There are a few variations of it, depending on what you know about the environment.

Q Learning is a form of TD learning. Specifically, it uses action values (as opposed to state values), sampling (as opposed to directly querying a model of the environment), it is off-policy (learning an optimal policy from observing non-optimal behaviour), and in the basic form it is single-step. Those 4 traits affect how the calculations for TD target or TD error are done.

The big difference when jumping from tabular to neural net variants is dropping the Q table in favour of a Q estimation function. However, these both represent the same thing to the agent - the agent’s current best guess of the expected return given a current state and action. And they both receive very similar updates due to experience.