_Objective:_ Perform domain-adaptation by adapting several layers using a randomized representation and not just the final layer thus performing alignment of the joint distribution and not just the marginals.
_Dataset:_ [Office](https://cs.stanford.edu/%7Ejhoffman/domainadapt/) and [ImageCLEF-DA1](http://imageclef.org/2014/adaptation).
## Inner-workings:
Basically an improvement on [RevGrad](https://arxiv.org/pdf/1505.07818.pdf) where instead of using the last embedding layer for the discriminator, a bunch of them is used.
To avoid dimension explosion when using the tensor product of all layers they instead use a randomized multi-linear representation:
[![screen shot 2017-06-01 at 5 35 46 pm](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)
Where:
* d is the dimension of the embedding (they use 1024)
* R is random matrix for which each element as a null average and variance of 1 (Bernoulli, Gaussian and Uniform are tried)
* z^l is the l-th layer
* ⊙ represents the Hadamard product
In practice they don't use all layers but just the 3-4 last layers for ResNet and AlexNet.
## Architecture:
[![screen shot 2017-06-01 at 5 34 44 pm](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)
They use the usual losses for domain adaptation with: - F minimizing the cross-entropy loss for classification and trying to reduce the gap between the distributions (indicated by D). - D maximizing the gap between the distributions.
[![screen shot 2017-06-01 at 5 40 53 pm](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)
## Results:
Improvement on state-of-the-art results for most tasks in the dataset, very easy to implement with any pre-trained network out of the box.