First published: 2014/09/26 (10 years ago) Abstract: Top-performing deep architectures are trained on massive amounts of labeled
data. In the absence of labeled data for a certain task, domain adaptation
often provides an attractive option given that labeled data of similar nature
but from a different domain (e.g. synthetic images) are available. Here, we
propose a new approach to domain adaptation in deep architectures that can be
trained on large amount of labeled data from the source domain and large amount
of unlabeled data from the target domain (no labeled target-domain data is
necessary).
As the training progresses, the approach promotes the emergence of "deep"
features that are (i) discriminative for the main learning task on the source
domain and (ii) invariant with respect to the shift between the domains. We
show that this adaptation behaviour can be achieved in almost any feed-forward
model by augmenting it with few standard layers and a simple new gradient
reversal layer. The resulting augmented architecture can be trained using
standard backpropagation.
Overall, the approach can be implemented with little effort using any of the
deep-learning packages. The method performs very well in a series of image
classification experiments, achieving adaptation effect in the presence of big
domain shifts and outperforming previous state-of-the-art on Office datasets.
_Objective:_ Build a network easily trainable by back-propagation to perform unsupervised domain adaptation while at the same time learning a good embedding for both source and target domains.
_Dataset:_ [SVHN](ufldl.stanford.edu/housenumbers/), [MNIST](yann.lecun.com/exdb/mnist/), [USPS](https://www.otexts.org/1577), [CIFAR](https://www.cs.toronto.edu/%7Ekriz/cifar.html) and [STL](https://cs.stanford.edu/%7Eacoates/stl10/).
#### Architecture:
Very similar to RevGrad but with some differences.
Basically a shared encoder and then a classifier and a reconstructor.
[![screen shot 2017-05-22 at 6 11 22 pm](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)](https://cloud.githubusercontent.com/assets/17261080/26318076/21361592-3f1a-11e7-9213-9cc07cfe2f2a.png)
The two losses are:
* the usual cross-entropy with softmax for the classifier
* the pixel-wise squared loss for reconstruction
Which are then combined using a trade-off hyper-parameter between classification and reconstruction.
They also use data augmentation to generate additional training data during the supervised training using only geometrical deformation: translation, rotation, skewing, and scaling
Plus denoising to reconstruct clean inputs given their noisy counterparts (zero-masked noise and Gaussian noise).
#### Results:
Outperforms state of the art on most tasks at the time, now outperformed itself by Generate To Adapt on most tasks.