Simple Black-Box Adversarial Perturbations for Deep Networks

Nina Narodytska and Shiva Prasad Kasiviswanathan

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG, cs.CR, stat.ML

**First published:** 2016/12/19 (6 years ago)

**Abstract:** Deep neural networks are powerful and popular learning models that achieve
state-of-the-art pattern recognition performance on many computer vision,
speech, and language processing tasks. However, these networks have also been
shown susceptible to carefully crafted adversarial perturbations which force
misclassification of the inputs. Adversarial examples enable adversaries to
subvert the expected system behavior leading to undesired consequences and
could pose a security risk when these systems are deployed in the real world.
In this work, we focus on deep convolutional neural networks and demonstrate
that adversaries can easily craft adversarial examples even without any
internal knowledge of the target network. Our attacks treat the network as an
oracle (black-box) and only assume that the output of the network can be
observed on the probed inputs. Our first attack is based on a simple idea of
adding perturbation to a randomly selected single pixel or a small set of them.
We then improve the effectiveness of this attack by carefully constructing a
small set of pixels to perturb by using the idea of greedy local-search. Our
proposed attacks also naturally extend to a stronger notion of
misclassification. Our extensive experimental results illustrate that even
these elementary attacks can reveal a deep neural network's vulnerabilities.
The simplicity and effectiveness of our proposed schemes mean that they could
serve as a litmus test for designing robust networks.
more
less

Nina Narodytska and Shiva Prasad Kasiviswanathan

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG, cs.CR, stat.ML

[link]
Table 4, 5, with only $.5\%$ of the pixels, you can get to $90\%$ missclassification, and it is a blackbox attack. #### LocSearchAdv Algorithm For $R$ rounds, at each round find $t$ top pixels that if you were to perturb them without bounds they could affect the classification the most. Then perturb each of the $t$ pixels such that they stay within the bounds (the magnitude of perturbation is a fixed value $r$). The top $t$ pixels are chosen from a subset of $P$ which is around $10\%$ of pixels; at the end of each round $P$ is updated to be the neighborhood of size $d\times d$ around the last $t$ top pixels. |

Reward Augmented Maximum Likelihood for Neural Structured Prediction

Mohammad Norouzi and Samy Bengio and Zhifeng Chen and Navdeep Jaitly and Mike Schuster and Yonghui Wu and Dale Schuurmans

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG

**First published:** 2016/09/01 (6 years ago)

**Abstract:** A key problem in structured output prediction is direct optimization of the
task reward function that matters for test evaluation. This paper presents a
simple and computationally efficient approach to incorporate task reward into a
maximum likelihood framework. We establish a connection between the
log-likelihood and regularized expected reward objectives, showing that at a
zero temperature, they are approximately equivalent in the vicinity of the
optimal solution. We show that optimal regularized expected reward is achieved
when the conditional distribution of the outputs given the inputs is
proportional to their exponentiated (temperature adjusted) rewards. Based on
this observation, we optimize conditional log-probability of edited outputs
that are sampled proportionally to their scaled exponentiated reward. We apply
this framework to optimize edit distance in the output label space. Experiments
on speech recognition and machine translation for neural sequence to sequence
models show notable improvements over a maximum likelihood baseline by using
edit distance augmented maximum likelihood.
more
less

Mohammad Norouzi and Samy Bengio and Zhifeng Chen and Navdeep Jaitly and Mike Schuster and Yonghui Wu and Dale Schuurmans

arXiv e-Print archive - 2016 via Local arXiv

Keywords: cs.LG

[link]
Assume you have both label and reward (some flexibility in accepting the output) as is the case for hamming distance and negative edit distance. Assume the output space is discrete, it can be a quantized n-dim space. Assume that given an input and a set of labels, computing the probability of the labels is $O(L)$, meaning that most of the computation is independent of the label. (not sure about this! You certainly don't want to sample from the network) Suppose that you want to maximize an approximate objective which is a tradeoff between optimizing the reward or the label and you control this tradeoff by one parameter $\tau$ which you fix beforehand. This actually only directly affects the reward. Then you can compute the loss, by sampling from each discrete value of reward and then sampling from those labels that have such reward. Therefore, you can quickly approximate the loss and the gradient through sampling. And the samples are independent of the output of the network, but rather fixed given $\tau$ and ground-truth. This is as opposed to sampling the network and sampling around the predicted label. #### 2.2 Sampling from the exponentiated payoff distribution Note that $q$ should be defined such that the entropy is fixed given $\tau$. The paper notes that given the ground-truth label, if the loss is hamming distance or negative edit distance, then samples can be put into buckets of same loss (loss is discrete). Then One possible family of distributions controlled by $\tau$ is given by a normalized count reweighted by $\exp\{e/\tau\}$ for each bucket. By varying $\tau$, you can go from uniform distribution ($\tau=\infty$) to delta ($\tau=0$). Eq. 11 shows the simplified count for each bucket to be used for normalization and further sampling. As noted above the equation, edge cases like repetitions is ignored in this count to make the computation easy. Given a fixed entropy $q$, then you can compute the RML loss by sampling from $q$ by first selecting an edit distance, and then sampling from the corresponding bucket. #### Footnotes I was not satisfied with the results. I didn't quite follow section 3. |

About