Reinforcement and Imitation Learning via Interactive No-Regret Learning
Stephane Ross
and
J. Andrew Bagnell
arXiv e-Print archive - 2014 via Local arXiv
Keywords:
cs.LG, stat.ML
First published: 2014/06/23 (10 years ago) Abstract: Recent work has demonstrated that problems-- particularly imitation learning
and structured prediction-- where a learner's predictions influence the
input-distribution it is tested on can be naturally addressed by an interactive
approach and analyzed using no-regret online learning. These approaches to
imitation learning, however, neither require nor benefit from information about
the cost of actions. We extend existing results in two directions: first, we
develop an interactive imitation learning approach that leverages cost
information; second, we extend the technique to address reinforcement learning.
The results provide theoretical support to the commonly observed successes of
online approximate policy iteration. Our approach suggests a broad new family
of algorithms and provides a unifying view of existing techniques for imitation
and reinforcement learning.