Evolved Policy Gradients
Rein Houthooft
and
Richard Y. Chen
and
Phillip Isola
and
Bradly C. Stadie
and
Filip Wolski
and
Jonathan Ho
and
Pieter Abbeel
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.LG, cs.AI
First published: 2018/02/13 (6 years ago) Abstract: We propose a metalearning approach for learning gradient-based reinforcement
learning (RL) algorithms. The idea is to evolve a differentiable loss function,
such that an agent, which optimizes its policy to minimize this loss, will
achieve high rewards. The loss is parametrized via temporal convolutions over
the agent's experience. Because this loss is highly flexible in its ability to
take into account the agent's history, it enables fast task learning. Empirical
results show that our evolved policy gradient algorithm (EPG) achieves faster
learning on several randomized environments compared to an off-the-shelf policy
gradient method. We also demonstrate that EPG's learned loss can generalize to
out-of-distribution test time tasks, and exhibits qualitatively different
behavior from other popular metalearning algorithms.