Practical tips for training Deep Q Networks

Company

Anyscale

Date Published

March 3, 2022

Author

Misha Laskin

Word count

875

Language

English

Hacker News points

None

URL

www.anyscale.com/blog/practical-tips-for-training-deep-q-networks

Summary

This series on reinforcement learning discusses two important limitations that can make Q learning unstable: the Bellman error and optimism. The first problem, the Bellman error, occurs when the target prediction is also a prediction, causing a runaway loss due to changes in both the current state and next state values. A practical solution to this issue is to use two different but similar networks to predict both the current and next state values, such as the original DQN's approach of keeping an old copy of the Q network or maintaining a target network with exponentially moving average weights. The second problem, optimism, arises from the Bellman equation and causes predictions made by Q functions optimized via the Bellman error to be too optimistic, leading to overestimation of future returns. To address this issue, double Q networks are proposed, which train two independent Q functions that select the minimum value as the target prediction, resulting in more conservative estimates of the Q function than DQN. Understanding these limitations and solutions allows intuition for RL systems and development of practical solutions to overcome common challenges when training RL algorithms.