Reward learning from human preferences and demonstrations in Atari
Borja Ibarz
and
Jan Leike
and
Tobias Pohlen
and
Geoffrey Irving
and
Shane Legg
and
Dario Amodei
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.LG, cs.AI, cs.NE, stat.ML
First published: 2018/11/15 (6 years ago) Abstract: To solve complex real-world problems with reinforcement learning, we cannot
rely on manually specified reward functions. Instead, we can have humans
communicate an objective to the agent directly. In this work, we combine two
approaches to learning from human feedback: expert demonstrations and
trajectory preferences. We train a deep neural network to model the reward
function and use its predicted reward to train an DQN-based deep reinforcement
learning agent on 9 Atari games. Our approach beats the imitation learning
baseline in 7 games and achieves strictly superhuman performance on 2 games
without using game rewards. Additionally, we investigate the goodness of fit of
the reward model, present some reward hacking problems, and study the effects
of noise in the human labels.