kangcheng's profile - ShortScience.org

proceedings.mlr.press
scholar.google.com

Understanding Black-box Predictions via Influence Functions
Koh, Pang Wei and Liang, Percy
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by kangcheng 7 years ago

**Goal**: identifying training points most responsible for a given prediction.

Given training points $z_1, \dots, z_n$, let loss function be $\frac{1}{n}\sum_{i=1}^nL(z_i, \theta)$ 

A function called influence function let us compute the parameter change if $z$ were upweighted by some small $\epsilon$. 
$$\hat{\theta}_{\epsilon, z} := \arg \min_{\theta \in \Theta} \frac{1}{n}\sum_{i=1}^n L(z_i, \theta) + \epsilon L(z, \theta)$$

$$\mathcal{I}_{\text{up, params}}(z) := \frac{d\hat{\theta}_{\epsilon, z}}{d\epsilon} = -H_{\hat{\theta}}^{-1} \nabla_\theta L(z, \hat{\theta})$$

$\mathcal{I}_{\text{up, params}}(z)$ shows how uplifting one point $z$ affect the estimate of the parameters $\theta$. 

Furthermore, we could determine how uplifting $z$ affect the loss estimate of a test point through chain rule. 
$$\mathcal{I}_{\text{up, loss}}(z, z_{\text{test}}) = \nabla_\theta L(z_{\text{test}}, \hat{\theta})^\top \mathcal{I}_{\text{up, params}}(z)$$ 

Apart from lifting one training point, change of the parameters with the change of a training point could also be estimated. 
$$\frac{d\hat{\theta}_{\epsilon, z_\delta, -z}}{d\epsilon} = \mathcal{I}_{\text{up, params}}(z_\delta) - \mathcal{I}_{\text{up, params}}(z)$$
This measures how purturbation $\delta$ to training point $z$ affect the parameter estimation $\theta$.

Section 3 describes some practicals about efficient implementing.

This set of tool could be used for some interpretable machine learning tasks.

arxiv.org
scholar.google.com

Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Recht, Benjamin and Roelofs, Rebecca and Schmidt, Ludwig and Shankar, Vaishaal
arXiv e-Print archive - 2018 via Local Bibsonomy
Keywords: dblp

[link] Summary by kangcheng 7 years ago

TL;DR although all researchers are overfitting the visual dataset, the relative rank of those classfiers are stable. In some sense, what happens about overfitting dataset in computer vision research is not too wrong.

kangcheng

sciscore: 1