Welcome to ShortScience.org! |

- ShortScience.org is a platform for post-publication discussion aiming to improve accessibility and reproducibility of research ideas.
- The website has 1584 public summaries, mostly in machine learning, written by the community and organized by paper, conference, and year.
- Reading summaries of papers is useful to obtain the perspective and insight of another reader, why they liked or disliked it, and their attempt to demystify complicated sections.
- Also, writing summaries is a good exercise to understand the content of a paper because you are forced to challenge your assumptions when explaining it.
- Finally, you can keep up to date with the flood of research by reading the latest summaries on our Twitter and Facebook pages.

Certifying Some Distributional Robustness with Principled Adversarial Training

Aman Sinha and Hongseok Namkoong and John Duchi

arXiv e-Print archive - 2017 via Local arXiv

Keywords: stat.ML, cs.LG

**First published:** 2017/10/29 (6 years ago)

**Abstract:** Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.
more
less

Aman Sinha and Hongseok Namkoong and John Duchi

arXiv e-Print archive - 2017 via Local arXiv

Keywords: stat.ML, cs.LG

[link]
Sinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each iteration, an attacker crafts an adversarial examples which the network is trained on. In a nutshell, their approach differs from previous ones (apart from the theoretical framework) in the used attacker. Specifically, their attacker optimizes $\arg\max_z l(\theta, z) - \gamma \|z – z^t\|_p^2$ where $z^t$ is a training sample chosen randomly during training. On a side note, I also recommend reading the reviews of this paper: https://openreview.net/forum?id=Hk6kPgZA- Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

Meta-Learning via Learned Loss

Sarah Bechtle and Artem Molchanov and Yevgen Chebotar and Edward Grefenstette and Ludovic Righetti and Gaurav Sukhatme and Franziska Meier

arXiv e-Print archive - 2019 via Local arXiv

Keywords: cs.LG, cs.AI, cs.RO, stat.ML

**First published:** 2019/06/12 (5 years ago)

**Abstract:** Typically, loss functions, regularization mechanisms and other important
aspects of training parametric models are chosen heuristically from a limited
set of options. In this paper, we take the first step towards automating this
process, with the view of producing models which train faster and more
robustly. Concretely, we present a meta-learning method for learning parametric
loss functions that can generalize across different tasks and model
architectures. We develop a pipeline for meta-training such loss functions,
targeted at maximizing the performance of the model trained under them. The
loss landscape produced by our learned losses significantly improves upon the
original task-specific losses in both supervised and reinforcement learning
tasks. Furthermore, we show that our meta-learning framework is flexible enough
to incorporate additional information at meta-train time. This information
shapes the learned loss function such that the environment does not need to
provide this information during meta-test time.
more
less

Sarah Bechtle and Artem Molchanov and Yevgen Chebotar and Edward Grefenstette and Ludovic Righetti and Gaurav Sukhatme and Franziska Meier

arXiv e-Print archive - 2019 via Local arXiv

Keywords: cs.LG, cs.AI, cs.RO, stat.ML

[link]
Bechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, model-based and model-free reinforcement learning tasks. The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outer-inner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the inner loop via gradient descent: $\theta_{new} = \theta - \alpha \nabla_{\theta}L_{learned} $. The key contribution of the paper is the way to construct a differentiable learning signal for the loss parameters $\Phi$. The framework requires to specify a task loss $L_T$ during meta train time, which can be for example the mean squared error for regression tasks. After updating the model parameters to $\theta_{new}$ the task loss is used to measure how much learning progress has been made with loss parameters $\Phi$. The key insight is the decomposition via chain-rule of $\nabla_{\Phi} L_T(y, f_{\theta_{new}})$: $\nabla_{\Phi} L_T(y, f_{\theta_{new}}) = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} \nabla_{\Phi} \theta_{new} = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} [\theta - \alpha \nabla_{\theta} \mathbb{E}[M_{\Phi}(y, f_{\theta}(x))]]$. This allows to update the loss parameters with gradient descent as: $\Phi_{new} = \Phi - \eta \nabla_{\Phi} L_T(y, f_{\theta_{new}})$. This update rules yield the following $ML^3$ algorithm for supervised learning tasks: https://i.imgur.com/tSaTbg8.png For reinforcement learning the task loss is the expected future reward of policies induced by the policy $\pi_{\theta}$, for model-based rl with respect to the approximate dynamics model and for the model free case a system independent surrogate: $L_T(\pi_{\theta_{new}}) = -\mathbb{E}_{\pi_{\theta_{new}}} \left[ R(\tau_{\theta_{new}}) \log \pi_{\theta_{new}}(\tau_{new})\right] $. The allows further to incorporate extra information via an additional loss term $L_{extra}$ and to consider the augmented task loss $\beta L_T + \gamma L_{extra} $, with weights $\beta, \gamma$ at train time. Possible extra loss terms are used to add physics priors, encouragement of exploratory behavior or to incorporate expert demonstrations. The experiments show that this, at test time unavailable information, is retained in the shape of the loss landscape. The paper is packed with insightful experiments and shows that the learned loss function: - yields in regression and classification better accuracies at train and test tasks - generalizes well and speeds up learning in model based rl tasks - yields better generalization and faster learning in model free rl - is agnostic across a bunch of evaluated architectures (2,3,4,5 layers) - with incorporated extra knowledge yields better performance than without and is superior to alternative approaches like iLQR in a MountainCar task. The paper introduces a promising alternative, by learning the loss parameters, to MAML like approaches that learn the model parameters. It would be interesting to see if the learned loss function generalizes better than learned model parameters to a broader distribution of tasks like the meta-world tasks. |

Fast R-CNN

Girshick, Ross B.

International Conference on Computer Vision - 2015 via Local Bibsonomy

Keywords: dblp

Girshick, Ross B.

International Conference on Computer Vision - 2015 via Local Bibsonomy

Keywords: dblp

[link]
This method is based on improving the speed of R-CNN \cite{conf/cvpr/GirshickDDM14} 1. Where R-CNN would have two different objective functions, Fast R-CNN combines localization and classification losses into a "multi-task loss" in order to speed up training. 2. It also uses a pooling method based on \cite{journals/pami/HeZR015} called the RoI pooling layer that scales the input so the images don't have to be scaled before being set an an input image to the CNN. "RoI max pooling works by dividing the $h \times w$ RoI window into an $H \times W$ grid of sub-windows of approximate size $h/H \times w/W$ and then max-pooling the values in each sub-window into the corresponding output grid cell." 3. Backprop is performed for the RoI pooling layer by taking the argmax of the incoming gradients that overlap the incoming values. This method is further improved by the paper "Faster R-CNN" \cite{conf/nips/RenHGS15} |

Generative adversarial networks uncover epidermal regulators and predict single cell perturbations

Arsham Ghahramani and Fiona M Watt and Nicholas M Luscombe

bioRxiv: The preprint server for biology - 2018 via Local CrossRef

Keywords:

Arsham Ghahramani and Fiona M Watt and Nicholas M Luscombe

bioRxiv: The preprint server for biology - 2018 via Local CrossRef

Keywords:

[link]
Lee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see the paper for details. [1] Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie. Generative Adversarial Perturbations. ArXiv, abs/1712.02328, 2017. |

Convolutional Neural Networks for Sentence Classification

Kim, Yoon

arXiv e-Print archive - 2014 via Local Bibsonomy

Keywords: dblp

Kim, Yoon

arXiv e-Print archive - 2014 via Local Bibsonomy

Keywords: dblp

[link]
#### Introduction * The paper demonstrates how simple CNNs, built on top of word embeddings, can be used for sentence classification tasks. * [Link to the paper](https://arxiv.org/abs/1408.5882) * [Implementation](https://github.com/shagunsodhani/CNN-Sentence-Classifier) #### Architecture * Pad input sentences so that they are of the same length. * Map words in the padded sentence using word embeddings (which may be either initialized as zero vectors or initialized as word2vec embeddings) to obtain a matrix corresponding to the sentence. * Apply convolution layer with multiple filter widths and feature maps. * Apply max-over-time pooling operation over the feature map. * Concatenate the pooling results from different layers and feed to a fully-connected layer with softmax activation. * Softmax outputs probabilistic distribution over the labels. * Use dropout for regularisation. #### Hyperparameters * RELU activation for convolution layers * Filter window of 3, 4, 5 with 100 feature maps each. * Dropout - 0.5 * Gradient clipping at 3 * Batch size - 50 * Adadelta update rule. #### Variants * CNN-rand * Randomly initialized word vectors. * CNN-static * Uses pre-trained vectors from word2vec and does not update the word vectors. * CNN-non-static * Same as CNN-static but updates word vectors during training. * CNN-multichannel * Uses two set of word vectors (channels). * One set is updated and other is not updated. #### Datasets * Sentiment analysis datasets for Movie Reviews, Customer Reviews etc. * Classification data for questions. * Maximum number of classes for any dataset - 6 #### Strengths * Good results on benchmarks despite being a simple architecture. * Word vectors obtained by non-static channel have more meaningful representation. #### Weakness * Small data with few labels. * Results are not very detailed or exhaustive. |

About