Semantic Segmentation using Adversarial Networks
Pauline Luc
and
Camille Couprie
and
Soumith Chintala
and
Jakob Verbeek
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV
First published: 2016/11/25 (7 years ago) Abstract: Adversarial training has been shown to produce state of the art results for
generative image modeling. In this paper we propose an adversarial training
approach to train semantic segmentation models. We train a convolutional
semantic segmentation network along with an adversarial network that
discriminates segmentation maps coming either from the ground truth or from the
segmentation network. The motivation for our approach is that it can detect and
correct higher-order inconsistencies between ground truth segmentation maps and
the ones produced by the segmentation net. Our experiments show that our
adversarial training approach leads to improved accuracy on the Stanford
Background and PASCAL VOC 2012 datasets.
# Semantic Segmentation using Adversarial networks
## Luc, Couprie, Chintala, Verbeek, 2016
* The paper aims to improve segmentation performance (IoU) by extending the network
* The authors derive intuition from GAN's, where a game is played between generator and discriminator.
* In this work, the game works as follows: a segmentation network maps an image WxHx3 to a label map WxHxC. a discriminator CNN is equipped with the task to discriminate the generated label maps from the ground truth. It is an adversarial game, because the segmentor aims for _more real_ label maps and the discriminator aims to distuinguish them from ground truth.
* The discriminator is a CNN that maps from HxWxC to a binary label.
* Section 3.2 outlines how to feed the label maps in three ways
* __Basic__ where the label maps are concatenated to the image and fed to the discriminator. Actually, the authors observe that leaving the image out does not change performance. So they end up feeding only the label maps for _basic_
* __Product__ where the label maps and input are multiplied, leading to an input of 3C channels
* __Scaling__ which resembles basic, but the one-hot distribution is perturbed a bit. This avoids the discriminator from trivially detecting the entropy rather than anything useful
* The discriminator is constructed with two axes of variation, leading to 4 architectures
* __FOV__: either a field of view of 18x18 or 34x34 over the label map
* __light__: an architecture with more or less capacity, e.g. number of channels
* The paper shows some fair result on the Stanford dataset, but keep in mind that it only contains 700 images
* The results in the Pascal dataset are minor, with the IoU improving from 71.8 to 72.0.
* Authors tried to pretrain the adversary, but they found this led to instable training. They end up training in an alternating scheme between segmentor and discriminator. They found that slow alternations work best.