[link]
Bechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, modelbased and modelfree reinforcement learning tasks. The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outerinner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the inner loop via gradient descent: $\theta_{new} = \theta  \alpha \nabla_{\theta}L_{learned} $. The key contribution of the paper is the way to construct a differentiable learning signal for the loss parameters $\Phi$. The framework requires to specify a task loss $L_T$ during meta train time, which can be for example the mean squared error for regression tasks. After updating the model parameters to $\theta_{new}$ the task loss is used to measure how much learning progress has been made with loss parameters $\Phi$. The key insight is the decomposition via chainrule of $\nabla_{\Phi} L_T(y, f_{\theta_{new}})$: $\nabla_{\Phi} L_T(y, f_{\theta_{new}}) = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} \nabla_{\Phi} \theta_{new} = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} [\theta  \alpha \nabla_{\theta} \mathbb{E}[M_{\Phi}(y, f_{\theta}(x))]]$. This allows to update the loss parameters with gradient descent as: $\Phi_{new} = \Phi  \eta \nabla_{\Phi} L_T(y, f_{\theta_{new}})$. This update rules yield the following $ML^3$ algorithm for supervised learning tasks: https://i.imgur.com/tSaTbg8.png For reinforcement learning the task loss is the expected future reward of policies induced by the policy $\pi_{\theta}$, for modelbased rl with respect to the approximate dynamics model and for the model free case a system independent surrogate: $L_T(\pi_{\theta_{new}}) = \mathbb{E}_{\pi_{\theta_{new}}} \left[ R(\tau_{\theta_{new}}) \log \pi_{\theta_{new}}(\tau_{new})\right] $. The allows further to incorporate extra information via an additional loss term $L_{extra}$ and to consider the augmented task loss $\beta L_T + \gamma L_{extra} $, with weights $\beta, \gamma$ at train time. Possible extra loss terms are used to add physics priors, encouragement of exploratory behavior or to incorporate expert demonstrations. The experiments show that this, at test time unavailable information, is retained in the shape of the loss landscape. The paper is packed with insightful experiments and shows that the learned loss function:  yields in regression and classification better accuracies at train and test tasks  generalizes well and speeds up learning in model based rl tasks  yields better generalization and faster learning in model free rl  is agnostic across a bunch of evaluated architectures (2,3,4,5 layers)  with incorporated extra knowledge yields better performance than without and is superior to alternative approaches like iLQR in a MountainCar task. The paper introduces a promising alternative, by learning the loss parameters, to MAML like approaches that learn the model parameters. It would be interesting to see if the learned loss function generalizes better than learned model parameters to a broader distribution of tasks like the metaworld tasks.
Your comment:
