Welcome to ShortScience.org! |

- ShortScience.org is a platform for post-publication discussion aiming to improve accessibility and reproducibility of research ideas.
- The website has 1567 public summaries, mostly in machine learning, written by the community and organized by paper, conference, and year.
- Reading summaries of papers is useful to obtain the perspective and insight of another reader, why they liked or disliked it, and their attempt to demystify complicated sections.
- Also, writing summaries is a good exercise to understand the content of a paper because you are forced to challenge your assumptions when explaining it.
- Finally, you can keep up to date with the flood of research by reading the latest summaries on our Twitter and Facebook pages.

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Zhou, Hattie and Lan, Janice and Liu, Rosanne and Yosinski, Jason

- 2019 via Local Bibsonomy

Keywords: pruning, nas

Zhou, Hattie and Lan, Janice and Liu, Rosanne and Yosinski, Jason

- 2019 via Local Bibsonomy

Keywords: pruning, nas

[link]
The Lottery Ticket Hypothesis is the idea that you can train a deep network, set all but a small percentage of its high-magnitude weights to zero, and retrain the network using the connection topology of the remaining weights, but only if you re-initialize the unpruned weights to the the values they had at the beginning of the first training. This suggests that part of the value of training such big networks is not that we need that many parameters to use their expressive capacity, but that we need many “draws” from the weight and topology distribution to find initial weight patterns that are well-disposed for learning. This paper out of Uber is a refreshingly exploratory experimental work that tries to understand the contours and contingencies of this effect. Their findings included: - The pruning criteria used in the original paper, where weights are kept according to which have highest final magnitude, works well. However, an alternate criteria, where you keep the weights that have increased the most in magnitude, works just as well and sometimes better. This makes a decent amount of sense, since it seems like we’re using magnitude as a signal of “did this weight come to play a meaningful role during training,” and so weights whose influence increased during training fall in that category, regardless of their starting point https://i.imgur.com/wTkNBod.png - The authors’ next question was: other than just re-initialize weights to their initial values, are there other things we can do that can capture all or part of the performance effect? The answer seems to be yes; they found that the most important thing seems to be keeping the sign of the weights aligned with what it was at its starting point. As long as you do that, redrawing initial weights (but giving them the right sign), or re-setting weights to a correctly signed constant value, both work nearly as well as the actual starting values https://i.imgur.com/JeujUr3.png - Turning instead to the weights on the pruning chopping block, the authors find that, instead of just zero-ing out all pruned weights, they can get even better performance if they zero the weights that moved towards zero during training, and re-initialize (but freeze) the weights that moved away from zero during training. The logic of the paper is “if the weight was trying to move to zero, bring it to zero, otherwise reinitialize it”. This performance remains high at even lower levels of training than does the initial zero-masking result - Finally, the authors found that just by performing the masking (i.e. keeping only weights with large final values), bringing those back to their values, and zeroing out the rest, *and not training at all*, they were able to get 40% test accuracy on MNIST, much better than chance. If they masked according to “large weights that kept the same sign during training,” they could get a pretty incredible 80% test accuracy on MNIST. Way below even simple trained models, but, again, this model wasn’t *trained*, and the only information about the data came in the form of a binary weight mask This paper doesn’t really try to come up with explanations that wrap all of these results up neatly with a bow, and I really respect that. I think it’s good for ML research culture for people to feel an affordance to just run a lot of targeted experiments aimed at explanation, and publish the results even if they don’t quite make sense yet. I feel like on this problem (and to some extent in machine learning generally), we’re the blind men each grabbing at one part of an elephant, trying to describe the whole. Hopefully, papers like this can bring us closer to understanding strange quirks of optimization like this one |

Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization

Luis Muñoz-González and Battista Biggio and Ambra Demontis and Andrea Paudice and Vasin Wongrassamee and Emil C. Lupu and Fabio Roli

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec '17 - 2017 via Local CrossRef

Keywords:

Luis Muñoz-González and Battista Biggio and Ambra Demontis and Andrea Paudice and Vasin Wongrassamee and Emil C. Lupu and Fabio Roli

Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security - AISec '17 - 2017 via Local CrossRef

Keywords:

[link]
Munoz-Gonzalez et al. propose a multi-class data poisening attack against deep neural networks based on back-gradient optimization. They consider the common poisening formulation stated as follows: $ \max_{D_c} \min_w \mathcal{L}(D_c \cup D_{tr}, w)$ where $D_c$ denotes a set of poisened training samples and $D_{tr}$ the corresponding clea dataset. Here, the loss $\mathcal{L}$ used for training is minimized as the inner optimization problem. As result, as long as learning itself does not have closed-form solutions, e.g., for deep neural networks, the problem is computationally infeasible. To resolve this problem, the authors propose using back-gradient optimization. Then, the gradient with respect to the outer optimization problem can be computed while only computing a limited number of iterations to solve the inner problem, see the paper for detail. In experiments, on spam/malware detection and digit classification, the approach is shown to increase test error of the trained model with only few training examples poisened. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

On the (Statistical) Detection of Adversarial Examples

Kathrin Grosse and Praveen Manoharan and Nicolas Papernot and Michael Backes and Patrick McDaniel

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.CR, cs.LG, stat.ML

**First published:** 2017/02/21 (5 years ago)

**Abstract:** Machine Learning (ML) models are applied in a variety of tasks such as
network intrusion detection or Malware classification. Yet, these models are
vulnerable to a class of malicious inputs known as adversarial examples. These
are slightly perturbed inputs that are classified incorrectly by the ML model.
The mitigation of these adversarial inputs remains an open problem. As a step
towards understanding adversarial examples, we show that they are not drawn
from the same distribution than the original data, and can thus be detected
using statistical tests. Using thus knowledge, we introduce a complimentary
approach to identify specific inputs that are adversarial. Specifically, we
augment our ML model with an additional output, in which the model is trained
to classify all adversarial inputs. We evaluate our approach on multiple
adversarial example crafting methods (including the fast gradient sign and
saliency map methods) with several datasets. The statistical test flags sample
sets containing adversarial inputs confidently at sample sizes between 10 and
100 data points. Furthermore, our augmented model either detects adversarial
examples as outliers with high accuracy (> 80%) or increases the adversary's
cost - the perturbation added - by more than 150%. In this way, we show that
statistical properties of adversarial examples are essential to their
detection.
more
less

Kathrin Grosse and Praveen Manoharan and Nicolas Papernot and Michael Backes and Patrick McDaniel

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.CR, cs.LG, stat.ML

[link]
Grosse et al. use statistical tests to detect adversarial examples; additionally, machine learning algorithms are adapted to detect adversarial examples on-the-fly of performing classification. The idea of using statistics tests to detect adversarial examples is simple: assuming that there is a true data distribution, a machine learning algorithm can only approximate this distribution – i.e. each algorithm “learns” an approximate distribution. The ideal adversary uses this discrepancy to draw a sample from the data distribution where data distribution and learned distribution differ – resulting in mis-classification. In practice, they show that kernel-based two-sample statistics hypothesis testing can be used to identify a set of adversarial examples (but not individual one). In order to also detect individual ones, each classifier is augmented to also detect whether the input is an adversarial example. This approach is similar to adversarial training, where adversarial examples are included in the training set with the correct label. However, I believe that it is possible to again craft new examples to the augmented classifier – as is also possible with adversarial training. |

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

Nicholas Carlini and David Wagner

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.LG, cs.CR, cs.CV

**First published:** 2017/05/20 (5 years ago)

**Abstract:** Neural networks are known to be vulnerable to adversarial examples: inputs
that are close to natural inputs but classified incorrectly. In order to better
understand the space of adversarial examples, we survey ten recent proposals
that are designed for detection and compare their efficacy. We show that all
can be defeated by constructing new loss functions. We conclude that
adversarial examples are significantly harder to detect than previously
appreciated, and the properties believed to be intrinsic to adversarial
examples are in fact not. Finally, we propose several simple guidelines for
evaluating future proposed defenses.
more
less

Nicholas Carlini and David Wagner

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.LG, cs.CR, cs.CV

[link]
Carlini and Wagner study the effectiveness of adversarial example detectors as defense strategy and show that most of them can by bypassed easily by known attacks. Specifically, they consider a set of adversarial example detection schemes, including neural networks as detectors and statistical tests. After extensive experiments, the authors provide a set of lessons which include: - Randomization is by far the most effective defense (e.g. dropout). - Defenses seem to be dataset-specific. There is a discrepancy between defenses working well on MNIST and on CIFAR. - Detection neural networks can easily be bypassed. Additionally, they provide a set of recommendations for future work: - For developing defense mechanism, we always need to consider strong white-box attacks (i.e. attackers that are informed about the defense mechanisms). - Reporting accuracy only is not meaningful; instead, false positives and negatives should be reported. - Simple datasets such as MNIST and CIFAR are not enough for evaluation. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients

Andrew Slavin Ross and Finale Doshi-Velez

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.LG, cs.CR, cs.CV

**First published:** 2017/11/26 (4 years ago)

**Abstract:** Deep neural networks have proven remarkably effective at solving many
classification problems, but have been criticized recently for two major
weaknesses: the reasons behind their predictions are uninterpretable, and the
predictions themselves can often be fooled by small adversarial perturbations.
These problems pose major obstacles for the adoption of neural networks in
domains that require security or transparency. In this work, we evaluate the
effectiveness of defenses that differentiably penalize the degree to which
small changes in inputs can alter model predictions. Across multiple attacks,
architectures, defenses, and datasets, we find that neural networks trained
with this input gradient regularization exhibit robustness to transferred
adversarial examples generated to fool all of the other models. We also find
that adversarial examples generated to fool gradient-regularized models fool
all other models equally well, and actually lead to more "legitimate,"
interpretable misclassifications as rated by people (which we confirm in a
human subject experiment). Finally, we demonstrate that regularizing input
gradients makes them more naturally interpretable as rationales for model
predictions. We conclude by discussing this relationship between
interpretability and robustness in deep neural networks.
more
less

Andrew Slavin Ross and Finale Doshi-Velez

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.LG, cs.CR, cs.CV

[link]
Ross and Doshi-Velez propose input gradient regularization to improve robustness and interpretability of neural networks. As the discussion of interpretability is quite limited in the paper, the main contribution is an extensive evaluation of input gradient regularization against adversarial examples – in comparison to defenses such as distillation or adversarial training. Specifically, input regularization as proposed in [1] is used: $\arg\min_\theta H(y,\hat{y}) + \lambda \|\nabla_x H(y,\hat{y})\|_2^2$ where $\theta$ are the network’s parameters, $x$ its input and $\hat{y}$ the predicted output. Here, $H$ might be a cross-entropy loss. It also becomes apparent why this regularization was originally called double-backpropagation because the second derivative is necessary during training. In experiments, the authors show that the proposed regularization is superior to many other defenses including distillation and adversarial training. Unfortunately, the comparison does not include other “regularization” techniques to improve robustness – such as Lipschitz regularization. This makes the comparison less interpretable, especially as the combination of input gradient regularization and adversarial training performs best (suggesting that adversarial training is a meaningful defense, as well). Still, I recommend a closer look on the experiments. For example, the authors also study the input gradients of defended models, leading to some interesting conclusions. [1] H. Drucket, Y. LeCun. Improving generalization performance using double backpropagation. IEEE Transactions on Neural Networks, 1992. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

About