Welcome to ShortScience.org! |
[link]
### Contribution The author conducts five experiments on EC2 to assess the impact of software-defined virtual networking with HTTP on composite container applications. Compared to previous container performance studies, it contributes new insight into the overlay networking aspect specifically for VM-hosted containers. Evidently, the SDVN causes a major performance loss whereas the container itself as well as the encryption cause minor (but still not negligible) losses. The results indicate that further practical work on container networking tools and stacks is needed for performance-critical distributed applications. ### Strong points The methodology of measuring the performance and using a baseline performance result is appropriate. The author provides the benchmark tooling (ppbench) and reference results (in dockerised form) to enable recomputable research. ### Weak points The title mentions microservices and the abstract promises design recommendations for microservice architectures. Yet, the paper only discusses containers which are a potential implementation technology but neither necessary for nor guaranteed to be microservices. Reducing the paper scope to just containers would be fair. The introduction contains an unnecessary redundant mention of Kubernetes, CoreOS, Mesos and reference [9] around the column wrap. The notation of SDN vs. SDVN is inconsistent between text and images; due to SDN being a wide area of research, the consistent use of SDVN is recommended. Fig. 3b is not clearly labelled. Resulting transfer losses - 100% means no loss, this is confusing. The y axis should presumably be inverted so that losses show highest for SDN with about 70%. The performance breakdown around 300kB messages in Fig. 2 is not sufficiently explained. Is it a repeating phenomenon which might be related to packet scheduling? The "just Docker" networking configuration is not explained, does it run in host or bridge mode? Which version of Docker was used? The size and time distribution of the 6 million HTTP requests should also be explained in greater detail to see how much randomness was involved. ### Further comments The work assumes that containers are always hosted in virtual machines while bare metal container hosting in the form of CaaS becomes increasingly available (Triton, CoreOS OnMetal, etc.). The results by Felter et al. are mentioned but not put into perspective. A comparison of how the networking is affected by VM/BM hosting would be a welcome addition, although AWS would probably not be a likely environment due to ECS running atop EC2. |
[link]
Sinha et al. introduce a variant of adversarial training based on distributional robust optimization. I strongly recommend reading the paper for understanding the introduced theoretical framework. The authors also provide guarantees on the obtained adversarial loss – and show experimentally that this guarantee is a realistic indicator. The adversarial training variant itself follows the general strategy of training on adversarially perturbed training samples in a min-max framework. In each iteration, an attacker crafts an adversarial examples which the network is trained on. In a nutshell, their approach differs from previous ones (apart from the theoretical framework) in the used attacker. Specifically, their attacker optimizes $\arg\max_z l(\theta, z) - \gamma \|z – z^t\|_p^2$ where $z^t$ is a training sample chosen randomly during training. On a side note, I also recommend reading the reviews of this paper: https://openreview.net/forum?id=Hk6kPgZA- Also view this summary at [davidstutz.de](https://davidstutz.de/category/reading/). |
[link]
Bechtle et al. propose meta learning via learned loss ($ML^3$) and derive and empirically evaluate the framework on classification, regression, model-based and model-free reinforcement learning tasks. The problem is formalized as learning parameters $\Phi$ of a meta loss function $M_\phi$ that computes loss values $L_{learned} = M_{\Phi}(y, f_{\theta}(x))$. Following the outer-inner loop meta algorithm design the learned loss $L_{learned}$ is used to update the parameters of the learner in the inner loop via gradient descent: $\theta_{new} = \theta - \alpha \nabla_{\theta}L_{learned} $. The key contribution of the paper is the way to construct a differentiable learning signal for the loss parameters $\Phi$. The framework requires to specify a task loss $L_T$ during meta train time, which can be for example the mean squared error for regression tasks. After updating the model parameters to $\theta_{new}$ the task loss is used to measure how much learning progress has been made with loss parameters $\Phi$. The key insight is the decomposition via chain-rule of $\nabla_{\Phi} L_T(y, f_{\theta_{new}})$: $\nabla_{\Phi} L_T(y, f_{\theta_{new}}) = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} \nabla_{\Phi} \theta_{new} = \nabla_f L_t \nabla_{\theta_{new}}f_{\theta_{new}} [\theta - \alpha \nabla_{\theta} \mathbb{E}[M_{\Phi}(y, f_{\theta}(x))]]$. This allows to update the loss parameters with gradient descent as: $\Phi_{new} = \Phi - \eta \nabla_{\Phi} L_T(y, f_{\theta_{new}})$. This update rules yield the following $ML^3$ algorithm for supervised learning tasks: https://i.imgur.com/tSaTbg8.png For reinforcement learning the task loss is the expected future reward of policies induced by the policy $\pi_{\theta}$, for model-based rl with respect to the approximate dynamics model and for the model free case a system independent surrogate: $L_T(\pi_{\theta_{new}}) = -\mathbb{E}_{\pi_{\theta_{new}}} \left[ R(\tau_{\theta_{new}}) \log \pi_{\theta_{new}}(\tau_{new})\right] $. The allows further to incorporate extra information via an additional loss term $L_{extra}$ and to consider the augmented task loss $\beta L_T + \gamma L_{extra} $, with weights $\beta, \gamma$ at train time. Possible extra loss terms are used to add physics priors, encouragement of exploratory behavior or to incorporate expert demonstrations. The experiments show that this, at test time unavailable information, is retained in the shape of the loss landscape. The paper is packed with insightful experiments and shows that the learned loss function: - yields in regression and classification better accuracies at train and test tasks - generalizes well and speeds up learning in model based rl tasks - yields better generalization and faster learning in model free rl - is agnostic across a bunch of evaluated architectures (2,3,4,5 layers) - with incorporated extra knowledge yields better performance than without and is superior to alternative approaches like iLQR in a MountainCar task. The paper introduces a promising alternative, by learning the loss parameters, to MAML like approaches that learn the model parameters. It would be interesting to see if the learned loss function generalizes better than learned model parameters to a broader distribution of tasks like the meta-world tasks. |
[link]
This method is based on improving the speed of R-CNN \cite{conf/cvpr/GirshickDDM14} 1. Where R-CNN would have two different objective functions, Fast R-CNN combines localization and classification losses into a "multi-task loss" in order to speed up training. 2. It also uses a pooling method based on \cite{journals/pami/HeZR015} called the RoI pooling layer that scales the input so the images don't have to be scaled before being set an an input image to the CNN. "RoI max pooling works by dividing the $h \times w$ RoI window into an $H \times W$ grid of sub-windows of approximate size $h/H \times w/W$ and then max-pooling the values in each sub-window into the corresponding output grid cell." 3. Backprop is performed for the RoI pooling layer by taking the argmax of the incoming gradients that overlap the incoming values. This method is further improved by the paper "Faster R-CNN" \cite{conf/nips/RenHGS15} |
[link]
Lee et al. propose a variant of adversarial training where a generator is trained simultaneously to generated adversarial perturbations. This approach follows the idea that it is possible to “learn” how to generate adversarial perturbations (as in [1]). In this case, the authors use the gradient of the classifier with respect to the input as hint for the generator. Both generator and classifier are then trained in an adversarial setting (analogously to generative adversarial networks), see the paper for details. [1] Omid Poursaeed, Isay Katsman, Bicheng Gao, Serge Belongie. Generative Adversarial Perturbations. ArXiv, abs/1712.02328, 2017. |