First published: 2016/11/25 (5 years ago) Abstract: Adversarial training has been shown to produce state of the art results for
generative image modeling. In this paper we propose an adversarial training
approach to train semantic segmentation models. We train a convolutional
semantic segmentation network along with an adversarial network that
discriminates segmentation maps coming either from the ground truth or from the
segmentation network. The motivation for our approach is that it can detect and
correct higher-order inconsistencies between ground truth segmentation maps and
the ones produced by the segmentation net. Our experiments show that our
adversarial training approach leads to improved accuracy on the Stanford
Background and PASCAL VOC 2012 datasets.
# Semantic Segmentation using Adversarial networks
## Luc, Couprie, Chintala, Verbeek, 2016
* The paper aims to improve segmentation performance (IoU) by extending the network
* The authors derive intuition from GAN's, where a game is played between generator and discriminator.
* In this work, the game works as follows: a segmentation network maps an image WxHx3 to a label map WxHxC. a discriminator CNN is equipped with the task to discriminate the generated label maps from the ground truth. It is an adversarial game, because the segmentor aims for _more real_ label maps and the discriminator aims to distuinguish them from ground truth.
* The discriminator is a CNN that maps from HxWxC to a binary label.
* Section 3.2 outlines how to feed the label maps in three ways
* __Basic__ where the label maps are concatenated to the image and fed to the discriminator. Actually, the authors observe that leaving the image out does not change performance. So they end up feeding only the label maps for _basic_
* __Product__ where the label maps and input are multiplied, leading to an input of 3C channels
* __Scaling__ which resembles basic, but the one-hot distribution is perturbed a bit. This avoids the discriminator from trivially detecting the entropy rather than anything useful
* The discriminator is constructed with two axes of variation, leading to 4 architectures
* __FOV__: either a field of view of 18x18 or 34x34 over the label map
* __light__: an architecture with more or less capacity, e.g. number of channels
* The paper shows some fair result on the Stanford dataset, but keep in mind that it only contains 700 images
* The results in the Pascal dataset are minor, with the IoU improving from 71.8 to 72.0.
* Authors tried to pretrain the adversary, but they found this led to instable training. They end up training in an alternating scheme between segmentor and discriminator. They found that slow alternations work best.
# Simultaneous Deep transfer across domains and tasks
## Tzeng, Hoffman, Saenko, 2015
* The paper aims to exploit unlabeled and sparsely labeled data from the target domain.
* As a baseline, they mention that one could match feature distributions between source and target domain. This work will also explore correlation between categories, such as _bottle_ and _mug._
* The authors derive inspiration from the _Name the dataset_ game by Torralbe and Efros. In this game, you train a classifier to predict which dataset an image originates from. This idea transpires into the domain confusion loss. The domain classifier measures the confusion between learned features from source and target domain. The image classifier learns a feature representation that makes the domain inditinguishable, as measured by the domain confusion.
* The second idea also learns the similarity structure between objects in the target domain. This works as follows. _We first compute the average output probability distribution, or “softlabel,” over the source training examples in each category. Then, for each target labeled example, we directly optimize our model to match the distribution over classes to the soft label. In this way we are able to perform task adaptation by transferring information to categories with no explicit labels in the target domain._
* The experiments take place in two situations. The _supervised_ case, where only few labels are present in the target domain. The _semi supervised_ case, where only few labels of a subset of the classes are present.
* In the final section, the authors perform analysis on theis own result. They show how the image classifier correctly labeled monitor, while no labels for monitor were present in the target domain.
It's not clear to me how predicting the variance with a neural network is a robust estimator of uncertainty. We all know the adversarial examples where we can simply fool a neural network with an example that is a little off. By a same argument, we could make adversarial examples to _fool_ the uncertainty estimator. I would like to see more work on this