Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers
Salman, Hadi
and
Li, Jerry
and
Razenshteyn, Ilya P.
and
Zhang, Pengchuan
and
Zhang, Huan
and
Bubeck, Sébastien
and
Yang, Greg
Neural Information Processing Systems Conference - 2019 via Local Bibsonomy
Keywords:
dblp
Salman et al. combined randomized smoothing with adversarial training based on an attack specifically designed against smoothed classifiers. Specifically, they consider the formulation of randomized smoothing by Cohen et al. [1]; here, Gaussian noise around the input (adversarial or clean) is sampled and the classifier takes a simple majority vote. In [1], Cohen et al. show that this results in good bounds on robustness. In this paper, Salman et al. propose an adaptive attack against randomized smoothing. Essentially, they use a simple PGD attack to attack a smoothed classifier, i.e., maximize the cross entropy loss of the smoothed classifier. To make the objective tractable, Monte Carlo samples are used in each iteration of the PGD optimization. Based on this attack, they do adversarial training, with adversarial examples computed against the smoothed (and adversarially trained) classifier. In experiments, this approach outperforms the certified robustness by Cohen et al. on several datasets.
[1] Jeremy M. Cohen, Elan Rosenfeld and J. Zico Kolter. Certified Adversarial Robustness via Randomized Smoothing. ArXiv, 1902.02918, 2019.
Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).