Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto
and
Herke van Hoof
and
Dave Meger
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.AI, cs.LG, stat.ML
First published: 2018/02/26 (6 years ago) Abstract: In value-based reinforcement learning methods such as deep Q-learning,
function approximation errors are known to lead to overestimated value
estimates and suboptimal policies. We show that this problem persists in an
actor-critic setting and propose novel mechanisms to minimize its effects on
both the actor and critic. Our algorithm takes the minimum value between a pair
of critics to restrict overestimation and delays policy updates to reduce
per-update error. We evaluate our method on the suite of OpenAI gym tasks,
outperforming the state of the art in every environment tested.