Domain-Adversarial Training of Neural Networks
Ganin, Yaroslav
and
Ustinova, Evgeniya
and
Ajakan, Hana
and
Germain, Pascal
and
Larochelle, Hugo
and
Laviolette, François
and
Marchand, Mario
and
Lempitsky, Victor S.
Journal of Machine Learning Research - 2016 via Local Bibsonomy
Keywords:
dblp
_Objective:_ Find a feature representation that cannot discriminate between the training (source) and test (target) domains using a discriminator trained directly on this embedding.
_Dataset:_ MNIST, SYN Numbers, SVHN, SYN Signs, OFFICE, PRID, VIPeR and CUHK.
## Architecture:
The basic idea behind this paper is to use a standard classifier network and chose one layer that will be the feature representation. The network before this layer is called the `Feature Extractor` and after the `Label Predictor`. Then a new network called a `Domain Classifier` is introduced that takes as input the extracted feature, its objective is to tell if a computed feature embedding came from an image from the source or target dataset.
At training the aim is to minimize the loss of the `Label Predictor` while maximizing the loss of the `Domain Classifier`. In theory we should end up with a feature embedding where the discriminator can't tell if the image came from the source or target domain, thus the domain shift should have been eliminated.
To maximize the domain loss, a new layer is introduced, the `Gradient Reversal Layer` which is equal to the identity during the forward-pass but reverse the gradient in the back-propagation phase. This enables the network to be trained using simple gradient descent algorithms.
What is interesting with this approach is that any initial network can be used by simply adding a few new set of layers for the domain classifiers. Below is a generic architecture.
[![screen shot 2017-04-18 at 1 59 53 pm](https://cloud.githubusercontent.com/assets/17261080/25129680/590f57ee-243f-11e7-8927-91124303b584.png)](https://cloud.githubusercontent.com/assets/17261080/25129680/590f57ee-243f-11e7-8927-91124303b584.png)
## Results:
Their approach is working but for some domain adaptation it completely fails and overall its performance are not great. Since then the state-of-the-art has changed, see DANN combined with GAN or ADDA.