Learning to learn by gradient descent by gradient descent
Marcin Andrychowicz
and
Misha Denil
and
Sergio Gomez
and
Matthew W. Hoffman
and
David Pfau
and
Tom Schaul
and
Nando de Freitas
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.NE, cs.LG
First published: 2016/06/14 (8 years ago) Abstract: The move from hand-designed features to learned features in machine learning
has been wildly successful. In spite of this, optimization algorithms are still
designed by hand. In this paper we show how the design of an optimization
algorithm can be cast as a learning problem, allowing the algorithm to learn to
exploit structure in the problems of interest in an automatic way. Our learned
algorithms, implemented by LSTMs, outperform generic, hand-designed competitors
on the tasks for which they are trained, and also generalize well to new tasks
with similar structure. We demonstrate this on a number of tasks, including
simple convex problems, training neural networks, and styling images with
neural art.