Model-Based Reinforcement Learning for Atari
Lukasz Kaiser
and
Mohammad Babaeizadeh
and
Piotr Milos
and
Blazej Osinski
and
Roy H Campbell
and
Konrad Czechowski
and
Dumitru Erhan
and
Chelsea Finn
and
Piotr Kozakowski
and
Sergey Levine
and
Ryan Sepassi
and
George Tucker
and
Henryk Michalewski
arXiv e-Print archive - 2019 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2019/03/01 (5 years ago) Abstract: Model-free reinforcement learning (RL) can be used to learn effective
policies for complex tasks, such as Atari games, even from image observations.
However, this typically requires very large amounts of interaction --
substantially more, in fact, than a human would need to learn the same games.
How can people learn so quickly? Part of the answer may be that people can
learn how the game works and predict which actions will lead to desirable
outcomes. In this paper, we explore how video prediction models can similarly
enable agents to solve Atari games with orders of magnitude fewer interactions
than model-free methods. We describe Simulated Policy Learning (SimPLe), a
complete model-based deep RL algorithm based on video prediction models and
present a comparison of several model architectures, including a novel
architecture that yields the best results in our setting. Our experiments
evaluate SimPLe on a range of Atari games and achieve competitive results with
only 100K interactions between the agent and the environment (400K frames),
which corresponds to about two hours of real-time play.