The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning.

Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples

Lee, Kimin and Lee, Honglak and Lee, Kibok and Shin, Jinwoo

International Conference on Learning Representations - 2018 via Local Bibsonomy

Keywords: dblp

Lee, Kimin and Lee, Honglak and Lee, Kibok and Shin, Jinwoo

International Conference on Learning Representations - 2018 via Local Bibsonomy

Keywords: dblp

[link]
Lee et al. propose a generative model for obtaining confidence-calibrated classifiers. Neural networks are known to be overconfident in their predictions – not only on examples from the task’s data distribution, but also on other examples taken from different distributions. The authors propose a GAN-based approach to force the classifier to predict uniform predictions on examples not taken from the data distribution. In particular, in addition to the target classifier, a generator and a discriminator are introduced. The generator generates “hard” out-of-distribution examples; ideally these examples are close to the in-distribution, i.e., the data distribution of the actual task. The discriminator is intended to distinguish between out- and in-distribution. The overall algorithm, including the necessary losses, is given in Algorithm 1. In experiments, the approach is shown to allow detecting out-distribution examples nearly perfectly. Examples of the generated “hard” out-of-distribution samples are given in Figure 1. https://i.imgur.com/NmF0fpN.png Algorithm 1: The proposed joint training scheme of out-distribution generator $G$, the in-/out-distribution discriminator $G$ and the original classifier providing $P_\theta$(y|x)$ with parameters $\theta$. https://i.imgur.com/kAclSQz.png Figure 1: A comparison of a regular GAN (a and c) to the proposed framework (c and d). Clearly, the proposed approach generates out-of-distribution samples (i.e., no meaningful digits) close to the original data distribution. |

Towards the first adversarially robust neural network model on MNIST

Lukas Schott and Jonas Rauber and Matthias Bethge and Wieland Brendel

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.CV

**First published:** 2018/05/23 (3 years ago)

**Abstract:** Despite much effort, deep neural networks remain highly susceptible to tiny
input perturbations and even for MNIST, one of the most common toy datasets in
computer vision, no neural network model exists for which adversarial
perturbations are large and make semantic sense to humans. We show that even
the widely recognized and by far most successful defense by Madry et al. (1)
overfits on the L-infinity metric (it's highly susceptible to L2 and L0
perturbations), (2) classifies unrecognizable images with high certainty, (3)
performs not much better than simple input binarization and (4) features
adversarial perturbations that make little sense to humans. These results
suggest that MNIST is far from being solved in terms of adversarial robustness.
We present a novel robust classification model that performs analysis by
synthesis using learned class-conditional data distributions. We derive bounds
on the robustness and go to great length to empirically evaluate our model
using maximally effective adversarial attacks by (a) applying decision-based,
score-based, gradient-based and transfer-based attacks for several different Lp
norms, (b) by designing a new attack that exploits the structure of our
defended model and (c) by devising a novel decision-based attack that seeks to
minimize the number of perturbed pixels (L0). The results suggest that our
approach yields state-of-the-art robustness on MNIST against L0, L2 and
L-infinity perturbations and we demonstrate that most adversarial examples are
strongly perturbed towards the perceptual boundary between the original and the
adversarial class.
more
less

Lukas Schott and Jonas Rauber and Matthias Bethge and Wieland Brendel

arXiv e-Print archive - 2018 via Local arXiv

Keywords: cs.CV

[link]
Schott et al. propose an analysis-by-synthetis approach for adversarially robust MNIST classification. In particular, as illustrated in Figure 1, class-conditional variational auto-encoders (i.e., one variational auto-encoder per class) are learned. The respective recognition models, i.e., encoders, are discarded. For classification, the optimization problem $l_y^*(x) = \max_z \log p(x|z) - \text{KL}(\mathcal{N}(z, \sigma I)|\mathcal{N}(0,1))$ is solved for each class $z$. Here, $p(x|z)$ represents the learned generative model. The optimization problem leads a latent code $z$ corresponding to the best reconstruction of the input. The corresponding likelihood can be used for classificaiton using Bayes’ theorem. The obtained posteriors $p(y|x)$ are then scaled using a modified softmax (see paper) to obtain the final decision. (Additionally, input binarization is used as defense.) https://i.imgur.com/ignvoHQ.png Figure 1: The proposed analysis by synthesis approach to MNIST classification. The depicted generators are taken from class-specific variational auto-encoders. In addition to the proposed defense, Schott et al. also derive lower and upper bounds on the robustness of the classification procedure. These bounds can be derived from the optimization problem above, see the paper for details. In experiments, they show that their defense outperforms state-of-the-art adversarial training and allows to estimate tight bounds. In addition, the method is robust against distal adversarial examples and the adversarial examples look more meaningful, see Figure 2. https://i.imgur.com/uxGzzg1.png Figure 2: Adversarial examples for the proposed “ABS” method, its binary variant and related work. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

Generating Natural Adversarial Examples

Zhao, Zhengli and Dua, Dheeru and Singh, Sameer

International Conference on Learning Representations - 2018 via Local Bibsonomy

Keywords: dblp

Zhao, Zhengli and Dua, Dheeru and Singh, Sameer

International Conference on Learning Representations - 2018 via Local Bibsonomy

Keywords: dblp

[link]
Zhao et al. propose a generative adversarial network (GAN) based approach to generate meaningful and natural adversarial examples for images and text. With natural adversarial examples, the authors refer to meaningful changes in the image content instead of adding seemingly random/adversarial noise – as illustrated in Figure 1. These natural adversarial examples can be crafted by first learning a generative model of the data, e.g., using a GAN together with an inverter (similar to an encoder), see Figure 2. Then, given an image $x$ and its latent code $z$, adversarial examples $\tilde{z} = z + \delta$ can be found within the latent code. The hope is that these adversarial examples will correspond to meaningful, naturally looking adversarial examples in the image space. https://i.imgur.com/XBhHJuY.png Figure 1: Illustration of natural adversarial examples in comparison ot regular, FGSM adversarial examples. https://i.imgur.com/HT2StGI.png Figure 2: Generative model (GAN) together with the required inverter. In practice, e.g., on MNIST, any black-box classifier can be attacked by randomly sampling possible perturbations $\delta$ in the random space (with increasing norm) until an adversarial perturbation is found. Here, the inverted from Figure 2 is trained on top of the critic of the GAN (although specific details are missing in the paper). Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

Temporal Difference Variational Auto-Encoder

Gregor, Karol and Besse, Frederic

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

Gregor, Karol and Besse, Frederic

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

[link]
This was definitely one of the more conceptually nuanced and complicated papers I’ve read recently, and I’ve only got about 60% confidence that I fully grasp all of its intuitions. However, I’m going to try to collect together what I did understand. There is a lot of research into generative models of text or image sequences, and some amount of research into building “models” in the reinforcement learning sense, where your model can predict future observations given current observations and your action. There’s an important underlying distinction here between model-based RL (where you learn a model of how the world evolves, and use that to optimize reward) and model-free RL (where you leave don’t bother explicitly learning a world model, and just directly try to optimize rewards) However, this paper identifies a few limitations of this research. 1) It’s largely focused on predicting observations, rather than predicting *state*. State is a bit of a fuzzy concept, and corresponds to, roughly, “the true underlying state of the game”. An example I like to use is a game where you walk in one door, and right next to it is a second door, which requires you to traverse the space and find rewards and a key before you can open. Now, imagine that the observation of your agent is it looking at the door. If the game doesn’t have any on-screen representation of the fact that you’ve found the key, it won’t be present in your observations, and you’ll observe the same thing at the point you have just entered and once you found the key. However, the state of the game at these two points will be quite different, in that in the latter case, your next states might be “opening the door” rather than “going to collect rewards”. Scenarios like this are referred to broadly as Partially Observable games or environments. This paper wants to build a model of how the game evolves into the future, but it wants to build a model of *state-to-state* evolution, rather than observation-to-observation evolution, since observations are typically both higher-dimensionality and also more noisy/less informative. 2) Past research has typically focused on predicting each next-step observation, rather than teaching models to be able to directly predict a state many steps in the future, without having to roll out the entire sequence of intermediate predictions. This is arguably quite valuable for making models that can predict the long term consequences of their decision This paper approaches that with an approach inspired by the Temporal Difference framework used in much of RL, in which you update your past estimate of future rewards by forcing it to be consistent with the actual observed rewards you encounter in the future. Except, in this model, we sample two a state (z1) and then a state some distance into the future (z2), and try to make our backwards-looking prediction of the state at time 1, taking into account observations that happened in between, match what our prediction was with only the information at time one. An important mechanistic nuance here is the idea of a “belief state”, something that captures all of your knowledge about game history up to a certain point. We can then directly sample a state Zt given the belief state Bt at that point. This isn’t actually possible with a model where we predict a state at time T given the state at time T-1, because the state at time Z-1 is itself a sample, and so in order to get a full distribution of Zt, you have to sample Zt over the distribution of Zt-1, and in order to get the distribution of Zt-1 you have to sample over the distribution of Zt-2, and so on and so on. Instead, we have a separate non-state variable, Bt that is created conditional on all our past observations (through a RNN). https://i.imgur.com/N0Al42r.png All said and done, the mechanics of this model look like: 1) Pick two points along the sequence trajectory 2) Calculate the belief state at each point, and, from that, construct a distribution over states at each timestep using p(z|b) 3) Have an additional model that predicts z1 given z2, b1, and b2 (that is, the future beliefs and states), and push the distribution over z1 from this model to be close to the distribution over z1 given only the information available at time t1 4) Have a model that predicts Z2 given Z1 and the time interval ahead that we’re jumping, and try to have this model be predictive/have high likelihood over the data 5) And, have a model that predicts an observation at time T2 given the state Z2, and train that so that we can convert our way back to an observation, given a state They mostly test it on fairly simple environments, but it’s an interesting idea, and I’d be curious to see other people develop it in future. (A strange aspect of this model is that, as far as I can tell, it’s non-interventionist, in that we’re not actually conditioning over agent action, or trying to learn a policy for an agent. This is purely trying to learn the long term transitions between states) |

Spatially Transformed Adversarial Examples

Xiao, Chaowei and Zhu, Jun-Yan and Li, Bo and He, Warren and Liu, Mingyan and Song, Dawn

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

Xiao, Chaowei and Zhu, Jun-Yan and Li, Bo and He, Warren and Liu, Mingyan and Song, Dawn

arXiv e-Print archive - 2018 via Local Bibsonomy

Keywords: dblp

[link]
Xiao et al. propose adversarial examples based on spatial transformations. Actually, this work is very similar to the adversarial deformations of [1]. In particular, a deformation flow field is optimized (allowing individual deformations per pixel) to cause a misclassification. The distance of the perturbation is computed on the flow field directly. Examples on MNIST are shown in Figure 1 – it can clearly be seen that most pixels are moved individually and no kind of smoothness is enforced. They also show that commonly used defense mechanisms are more or less useless against these attacks. Unfortunately, and in contrast to [1], they do not consider adversarial training on their own adversarial transformations as defense. https://i.imgur.com/uDfttMU.png Figure 1: Examples of the computed adversarial examples/transformations on MNIST for three different models. Note that these are targeted attacks. [1] R. Alaifair, G. S. Alberti, T. Gauksson. Adef: an Iterative Algorithm to Construct Adversarial Deformations. ArXiv, abs/1804.07729v2, 2018. Also find this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

About