Online Meta-Learning on ShortScience.org

proceedings.mlr.press
scholar.google.com

Online Meta-Learning
Finn, Chelsea and Rajeswaran, Aravind and Kakade, Sham M. and Levine, Sergey
International Conference on Machine Learning - 2019 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Massimo Caccia 5 years ago

## Introduction

Two distinct research paradigms have studied how prior tasks or experiences can be used by an agent to inform future learning.

* Meta Learning: past experience is used to acquire a prior over model parameters or a learning procedure, and typically studies a setting where a set of meta-training tasks are made available together upfront
* Online learning : a sequential setting where tasks are revealed one after another, but aims to attain zero-shot generalization without any task-specific adaptation.

We argue that neither setting is ideal for studying continual lifelong learning. Meta-learning deals with learning to learn, but neglects the sequential and non-stationary aspects of the problem. Online learning offers an appealing theoretical framework, but does not generally consider how past experience can accelerate adaptation to a new task.

## Online Learning

Online learning focuses on regret minimization. Most standard notion of regret is to compare to the cumulative loss of the best fixed model in hindsight:
https://i.imgur.com/pbZG4kK.png
One way minimize regret is with Follow the Leader (FTL):
https://i.imgur.com/NCs73vG.png

## Online Meta-learning Setting:

let $U_t$ be the update procedure for task $t$
e.g. in MAML:
https://i.imgur.com/Q4I4HkD.png

The overall protocol for the setting is as follows:
1. At round t, the agent chooses a model defined by $w_t$
2. The world simultaneously chooses task defined by $f_t$
3. The agent obtains access to the update procedure $U_t$, and uses it to update parameters as $\tilde w_t = U_t(w_t)$
4. The agent incurs loss $f_t(\tilde w_t )$. Advance to round t + 1.

the goal for the agent is to minimize regrets over rounds.
Achieving sublinear regrets means you're improving and converging to upper bound (joint training on all tasks)

## Algorithm and Analysis:

Follow the meta-leader (FTML):
https://i.imgur.com/qWb9g8Q.png

FTML’s regret is sublinear (under some assumption)

Your comment: