Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg
and
Volodymyr Mnih
and
Wojciech Marian Czarnecki
and
Tom Schaul
and
Joel Z Leibo
and
David Silver
and
Koray Kavukcuoglu
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG, cs.NE
First published: 2016/11/16 (8 years ago) Abstract: Deep reinforcement learning agents have achieved state-of-the-art results by
directly maximising cumulative reward. However, environments contain a much
wider variety of possible training signals. In this paper, we introduce an
agent that also maximises many other pseudo-reward functions simultaneously by
reinforcement learning. All of these tasks share a common representation that,
like unsupervised learning, continues to develop in the absence of extrinsic
rewards. We also introduce a novel mechanism for focusing this representation
upon extrinsic rewards, so that learning can rapidly adapt to the most relevant
aspects of the actual task. Our agent significantly outperforms the previous
state-of-the-art on Atari, averaging 880\% expert human performance, and a
challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks
leading to a mean speedup in learning of 10$\times$ and averaging 87\% expert
human performance on Labyrinth.