Black-box Adversarial Attacks with Limited Queries and Information on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Black-box Adversarial Attacks with Limited Queries and Information
Andrew Ilyas and Logan Engstrom and Anish Athalye and Jessy Lin
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CV, cs.CR, stat.ML
more

Summaries/Notes 1

[link] Summary by David Stutz 5 years ago

Ilyas et al. propose three query-efficient black-box adversarial example attacks using distribution-based gradient estimation. In particular, their simplest attacks involves estimating the gradient locally using a search distribution:

$ \nabla_x \mathbb{E}_{\pi(\theta|x)} [F(\theta)] = \mathbb{E}_{\pi(\theta|x)} [F(\theta) \nabla_x \log(\pi(\theta|x))]$

where $F(\cdot)$ is a loss function – e.g., using the cross-entropy loss which is maximized to obtain an adversarial example. The above equation, using a Gaussian noise search distribution leads to a simple approximator for the gradient:

$\nabla \mathbb{E}[F(\theta)] = \frac{1}{\sigma n} \sum_{i = 1}^n \delta_i F(\theta + \sigma \delta_i)$

where $\sigma$ is the search variance and $\delta_i$ are sampled from a unit Gaussian. This scheme can then be applied as part of the projected gradient descent white-box attacks to obtain adversarial examples.

The above attack assumes that the black-box network provides probability outputs in order to compute the loss $F$. In the remainder of the paper, the authors also generalize this approach to the label-only case, where the network only provides the top $k$ labels for each input. In experiments, the attacks is shown to be effective while rarely requiring more than $50$k queries on ImageNet.

Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/).

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private