First published: 2019/09/10 (5 years ago) Abstract: A core capability of intelligent systems is the ability to quickly learn new
tasks by drawing on prior experience. Gradient (or optimization) based
meta-learning has recently emerged as an effective approach for few-shot
learning. In this formulation, meta-parameters are learned in the outer loop,
while task-specific models are learned in the inner-loop, by using only a small
amount of data from the current task. A key challenge in scaling these
approaches is the need to differentiate through the inner loop learning
process, which can impose considerable computational and memory burdens. By
drawing upon implicit differentiation, we develop the implicit MAML algorithm,
which depends only on the solution to the inner level optimization and not the
path taken by the inner loop optimizer. This effectively decouples the
meta-gradient computation from the choice of inner loop optimizer. As a result,
our approach is agnostic to the choice of inner loop optimizer and can
gracefully handle many gradient steps without vanishing gradients or memory
constraints. Theoretically, we prove that implicit MAML can compute accurate
meta-gradients with a memory footprint that is, up to small constant factors,
no more than that which is required to compute a single inner loop gradient and
at no overall increase in the total computational cost. Experimentally, we show
that these benefits of implicit MAML translate into empirical gains on few-shot
image recognition benchmarks.