Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg
and
Volodymyr Mnih
and
Wojciech Marian Czarnecki
and
Tom Schaul
and
Joel Z Leibo
and
David Silver
and
Koray Kavukcuoglu
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG, cs.NE
First published: 2016/11/16 (8 years ago) Abstract: Deep reinforcement learning agents have achieved state-of-the-art results by
directly maximising cumulative reward. However, environments contain a much
wider variety of possible training signals. In this paper, we introduce an
agent that also maximises many other pseudo-reward functions simultaneously by
reinforcement learning. All of these tasks share a common representation that,
like unsupervised learning, continues to develop in the absence of extrinsic
rewards. We also introduce a novel mechanism for focusing this representation
upon extrinsic rewards, so that learning can rapidly adapt to the most relevant
aspects of the actual task. Our agent significantly outperforms the previous
state-of-the-art on Atari, averaging 880\% expert human performance, and a
challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks
leading to a mean speedup in learning of 10$\times$ and averaging 87\% expert
human performance on Labyrinth.
They describe a version of reinforcement learning where the system also learns to solve some auxiliary tasks, which helps with the main objective.
https://i.imgur.com/fmTVxvr.png
In addition to normal Q-learning, which predicts the downstream reward, they have the system learning 1) a separate policy for maximally changing the pixels on the screen, 2) maximally activating units in a hidden layer, and 3) predicting the reward at the next step, using biased sampling. They show that this improves learning speed and performance on Atari games and Labyrinth (a Quake-like 3D game).