Welcome to ShortScience.org! |

- ShortScience.org is a platform for post-publication discussion aiming to improve accessibility and reproducibility of research ideas.
- The website has 1584 public summaries, mostly in machine learning, written by the community and organized by paper, conference, and year.
- Reading summaries of papers is useful to obtain the perspective and insight of another reader, why they liked or disliked it, and their attempt to demystify complicated sections.
- Also, writing summaries is a good exercise to understand the content of a paper because you are forced to challenge your assumptions when explaining it.
- Finally, you can keep up to date with the flood of research by reading the latest summaries on our Twitter and Facebook pages.

About Microservices, Containers and their Underestimated Impact on Network Performance

Nane Kratzke

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.DC

**First published:** 2017/09/14 (6 years ago)

**Abstract:** Microservices are used to build complex applications composed of small,
independent and highly decoupled processes. Recently, microservices are often
mentioned in one breath with container technologies like Docker. That is why
operating system virtualization experiences a renaissance in cloud computing.
These approaches shall provide horizontally scalable, easily deployable systems
and a high-performance alternative to hypervisors. Nevertheless, performance
impacts of containers on top of hypervisors are hardly investigated.
Furthermore, microservice frameworks often come along with software defined
networks. This contribution presents benchmark results to quantify the impacts
of container, software defined networking and encryption on network
performance. Even containers, although postulated to be lightweight, show a
noteworthy impact to network performance. These impacts can be minimized on
several system layers. Some design recommendations for cloud deployed systems
following the microservice architecture pattern are derived.
more
less

Nane Kratzke

arXiv e-Print archive - 2017 via Local arXiv

Keywords: cs.DC

[link]
### Contribution The author conducts five experiments on EC2 to assess the impact of software-defined virtual networking with HTTP on composite container applications. Compared to previous container performance studies, it contributes new insight into the overlay networking aspect specifically for VM-hosted containers. Evidently, the SDVN causes a major performance loss whereas the container itself as well as the encryption cause minor (but still not negligible) losses. The results indicate that further practical work on container networking tools and stacks is needed for performance-critical distributed applications. ### Strong points The methodology of measuring the performance and using a baseline performance result is appropriate. The author provides the benchmark tooling (ppbench) and reference results (in dockerised form) to enable recomputable research. ### Weak points The title mentions microservices and the abstract promises design recommendations for microservice architectures. Yet, the paper only discusses containers which are a potential implementation technology but neither necessary for nor guaranteed to be microservices. Reducing the paper scope to just containers would be fair. The introduction contains an unnecessary redundant mention of Kubernetes, CoreOS, Mesos and reference [9] around the column wrap. The notation of SDN vs. SDVN is inconsistent between text and images; due to SDN being a wide area of research, the consistent use of SDVN is recommended. Fig. 3b is not clearly labelled. Resulting transfer losses - 100% means no loss, this is confusing. The y axis should presumably be inverted so that losses show highest for SDN with about 70%. The performance breakdown around 300kB messages in Fig. 2 is not sufficiently explained. Is it a repeating phenomenon which might be related to packet scheduling? The "just Docker" networking configuration is not explained, does it run in host or bridge mode? Which version of Docker was used? The size and time distribution of the 6 million HTTP requests should also be explained in greater detail to see how much randomness was involved. ### Further comments The work assumes that containers are always hosted in virtual machines while bare metal container hosting in the form of CaaS becomes increasingly available (Triton, CoreOS OnMetal, etc.). The results by Felter et al. are mentioned but not put into perspective. A comparison of how the networking is affected by VM/BM hosting would be a welcome addition, although AWS would probably not be a likely environment due to ECS running atop EC2. |

Certifying Some Distributional Robustness with Principled Adversarial Training

Aman Sinha and Hongseok Namkoong and John Duchi

arXiv e-Print archive - 2017 via Local arXiv

Keywords: stat.ML, cs.LG

**First published:** 2017/10/29 (6 years ago)

**Abstract:** Neural networks are vulnerable to adversarial examples and researchers have
proposed many heuristic attack and defense mechanisms. We address this problem
through the principled lens of distributionally robust optimization, which
guarantees performance under adversarial input perturbations. By considering a
Lagrangian penalty formulation of perturbing the underlying data distribution
in a Wasserstein ball, we provide a training procedure that augments model
parameter updates with worst-case perturbations of training data. For smooth
losses, our procedure provably achieves moderate levels of robustness with
little computational or statistical cost relative to empirical risk
minimization. Furthermore, our statistical guarantees allow us to efficiently
certify robustness for the population loss. For imperceptible perturbations,
our method matches or outperforms heuristic approaches.
more
less

Aman Sinha and Hongseok Namkoong and John Duchi

arXiv e-Print archive - 2017 via Local arXiv

Keywords: stat.ML, cs.LG

[link]
Sinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each iteration, an attacker crafts an adversarial examples which the network is trained on. In a nutshell, their approach differs from previous ones (apart from the theoretical framework) in the used attacker. Specifically, their attacker optimizes $\arg\max_z l(\theta, z) - \gamma \|z – z^t\|_p^2$ where $z^t$ is a training sample chosen randomly during training. On a side note, I also recommend reading the reviews of this paper: https://openreview.net/forum?id=Hk6kPgZA- Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |

Meta-Learning via Learned Loss

Sarah Bechtle and Artem Molchanov and Yevgen Chebotar and Edward Grefenstette and Ludovic Righetti and Gaurav Sukhatme and Franziska Meier

arXiv e-Print archive - 2019 via Local arXiv

Keywords: cs.LG, cs.AI, cs.RO, stat.ML

**First published:** 2019/06/12 (5 years ago)

**Abstract:** Typically, loss functions, regularization mechanisms and other important
aspects of training parametric models are chosen heuristically from a limited
set of options. In this paper, we take the first step towards automating this
process, with the view of producing models which train faster and more
robustly. Concretely, we present a meta-learning method for learning parametric
loss functions that can generalize across different tasks and model
architectures. We develop a pipeline for meta-training such loss functions,
targeted at maximizing the performance of the model trained under them. The
loss landscape produced by our learned losses significantly improves upon the
original task-specific losses in both supervised and reinforcement learning
tasks. Furthermore, we show that our meta-learning framework is flexible enough
to incorporate additional information at meta-train time. This information
shapes the learned loss function such that the environment does not need to
provide this information during meta-test time.
more
less

Sarah Bechtle and Artem Molchanov and Yevgen Chebotar and Edward Grefenstette and Ludovic Righetti and Gaurav Sukhatme and Franziska Meier

arXiv e-Print archive - 2019 via Local arXiv

Keywords: cs.LG, cs.AI, cs.RO, stat.ML

[link]
Bechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, model-based and model-free reinforcement learning tasks. The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outer-inner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the inner loop via gradient descent: $\theta_{new} = \theta - \alpha \nabla_{\theta}L_{learned} $. The key contribution of the paper is the way to construct a differentiable learning signal for the loss parameters $\Phi$. The framework requires to specify a task loss $L_T$ during meta train time, which can be for example the mean squared error for regression tasks. After updating the model parameters to $\theta_{new}$ the task loss is used to measure how much learning progress has been made with loss parameters $\Phi$. The key insight is the decomposition via chain-rule of $\nabla_{\Phi} L_T(y, f_{\theta_{new}})$: $\nabla_{\Phi} L_T(y, f_{\theta_{new}}) = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} \nabla_{\Phi} \theta_{new} = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} [\theta - \alpha \nabla_{\theta} \mathbb{E}[M_{\Phi}(y, f_{\theta}(x))]]$. This allows to update the loss parameters with gradient descent as: $\Phi_{new} = \Phi - \eta \nabla_{\Phi} L_T(y, f_{\theta_{new}})$. This update rules yield the following $ML^3$ algorithm for supervised learning tasks: https://i.imgur.com/tSaTbg8.png For reinforcement learning the task loss is the expected future reward of policies induced by the policy $\pi_{\theta}$, for model-based rl with respect to the approximate dynamics model and for the model free case a system independent surrogate: $L_T(\pi_{\theta_{new}}) = -\mathbb{E}_{\pi_{\theta_{new}}} \left[ R(\tau_{\theta_{new}}) \log \pi_{\theta_{new}}(\tau_{new})\right] $. The allows further to incorporate extra information via an additional loss term $L_{extra}$ and to consider the augmented task loss $\beta L_T + \gamma L_{extra} $, with weights $\beta, \gamma$ at train time. Possible extra loss terms are used to add physics priors, encouragement of exploratory behavior or to incorporate expert demonstrations. The experiments show that this, at test time unavailable information, is retained in the shape of the loss landscape. The paper is packed with insightful experiments and shows that the learned loss function: - yields in regression and classification better accuracies at train and test tasks - generalizes well and speeds up learning in model based rl tasks - yields better generalization and faster learning in model free rl - is agnostic across a bunch of evaluated architectures (2,3,4,5 layers) - with incorporated extra knowledge yields better performance than without and is superior to alternative approaches like iLQR in a MountainCar task. The paper introduces a promising alternative, by learning the loss parameters, to MAML like approaches that learn the model parameters. It would be interesting to see if the learned loss function generalizes better than learned model parameters to a broader distribution of tasks like the meta-world tasks. |

Fast R-CNN

Girshick, Ross B.

International Conference on Computer Vision - 2015 via Local Bibsonomy

Keywords: dblp

Girshick, Ross B.

International Conference on Computer Vision - 2015 via Local Bibsonomy

Keywords: dblp

[link]
This method is based on improving the speed of R-CNN \cite{conf/cvpr/GirshickDDM14} 1. Where R-CNN would have two different objective functions, Fast R-CNN combines localization and classification losses into a "multi-task loss" in order to speed up training. 2. It also uses a pooling method based on \cite{journals/pami/HeZR015} called the RoI pooling layer that scales the input so the images don't have to be scaled before being set an an input image to the CNN. "RoI max pooling works by dividing the $h \times w$ RoI window into an $H \times W$ grid of sub-windows of approximate size $h/H \times w/W$ and then max-pooling the values in each sub-window into the corresponding output grid cell." 3. Backprop is performed for the RoI pooling layer by taking the argmax of the incoming gradients that overlap the incoming values. This method is further improved by the paper "Faster R-CNN" \cite{conf/nips/RenHGS15} |

Generative adversarial networks uncover epidermal regulators and predict single cell perturbations

Arsham Ghahramani and Fiona M Watt and Nicholas M Luscombe

bioRxiv: The preprint server for biology - 2018 via Local CrossRef

Keywords:

Arsham Ghahramani and Fiona M Watt and Nicholas M Luscombe

bioRxiv: The preprint server for biology - 2018 via Local CrossRef

Keywords:

[link]
Lee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see the paper for details. [1] Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie. Generative Adversarial Perturbations. ArXiv, abs/1712.02328, 2017. |

About