Adversarial Attacks on Neural Network Policies
Sandy Huang
and
Nicolas Papernot
and
Ian Goodfellow
and
Yan Duan
and
Pieter Abbeel
arXiv e-Print archive - 2017 via Local arXiv
Keywords:
cs.LG, cs.CR, stat.ML
First published: 2017/02/08 (7 years ago) Abstract: Machine learning classifiers are known to be vulnerable to inputs maliciously
constructed by adversaries to force misclassification. Such adversarial
examples have been extensively studied in the context of computer vision
applications. In this work, we show adversarial attacks are also effective when
targeting neural network policies in reinforcement learning. Specifically, we
show existing adversarial example crafting techniques can be used to
significantly degrade test-time performance of trained policies. Our threat
model considers adversaries capable of introducing small perturbations to the
raw input of the policy. We characterize the degree of vulnerability across
tasks and training algorithms, for a subclass of adversarial-example attacks in
white-box and black-box settings. Regardless of the learned task or training
algorithm, we observe a significant drop in performance, even with small
adversarial perturbations that do not interfere with human perception. Videos
are available at http://rll.berkeley.edu/adversarial.