First published: 2017/02/21 (5 years ago) Abstract: Recent advances in one-shot learning have produced models that can learn from
a handful of labeled examples, for passive classification and regression tasks.
This paper combines reinforcement learning with one-shot learning, allowing the
model to decide, during classification, which examples are worth labeling. We
introduce a classification task in which a stream of images are presented and,
on each time step, a decision must be made to either predict a label or pay to
receive the correct label. We present a recurrent neural network based
action-value function, and demonstrate its ability to learn how and when to
request labels. Through the choice of reward function, the model can achieve a
higher prediction accuracy than a similar model on a purely supervised task, or
trade prediction accuracy for fewer label requests.
The paper combines reinforcement learning with active learning to learn when to request labels to improve prediction accuracy.
- The model can either predict the label at time step $t$ or request it in the next time step, in form of a one-hot vector output of an LSTM with the previous label (if requested) and the current image as an input.
- A reward is issued based on the outcome of requesting labels (-0.05), or correctly (+1) or incorrectly(-1) predicting the label.
- The optimal strategy involves storing class embeddings and their labels in memory and only requesting labels if a unseen class is encountered.
The model is evaluated against the *Omniglot* dataset and learns a non-naive strategy to request fewer labels the more data of a class was encountered, using a learned uncertainty measure.
The magnitude of the reward for incorrect labeling decides on the amount of requested labels and can be used to maximize the accuracy during prediction.
## Active Learning
Active Learning is a special case of semi-supervised learning, which aims to reduce the amount of supervision needed during training. The model typically selects which datapoints to label by applying different metrics like most information content, highest uncertainty or other heuristics.
## Reinforcement Learning
Reinforcement learning agents try to learn an optimal policy $\pi^*(s_t)$ for a state $s_t$ at time $t$ that will maximize future rewards issued by the environment, by choosing an action $a_t$. The policy is represented by a function $Q^*(s_t, a_t)$, which can be approximated and learned in form of a neural network.