[link]
The goal of this work is to edit the model’s weights given new edit pairs ($x_e, y_e$) at test time. They achieve this by learning a "model editor network" that takes a fine tuning gradient computed from ($x_e, y_e$) and transforms this into a weight update. $$ f(\nabla W_l) \rightarrow \tilde\nabla W_l$$ The editor network is parameterized by the layer that it is predicting using a FiLM style scale and shift. The editor network is trained on a small set of examples ($D^{tr}_{edit}$). The paper states that this dataset contains edits that are similar to the "the types of edits that will be made." which is interesting because it introduces generalization limitations to the potential edits. An extra loss term is used to prevent unintended changes for other inputs to the model (called $x_{loc}$). This is achieved with the following loss that will maintain the predictions to be the same value. $$L_{loc} = KL(p_{\theta_W}(\cdot | x_{loc}) \| p_{\theta_\tilde{W}}(\cdot | x_{loc}))$$ Some intuition for why this works is editor network $f$ approximates full dataset gradient from just a single example so it is more efficient. It can reduce the change of elements of the weight matrix which were disruptive to the loss when it was trained, information that requires many training examples to uncover.
Your comment:
|