Domain Adaptation with Randomized Multilinear Adversarial Networks on ShortScience.org

arxiv.org
scholar.google.com

Domain Adaptation with Randomized Multilinear Adversarial Networks
Long, Mingsheng and Cao, Zhangjie and Wang, Jianmin and Jordan, Michael I.
arXiv e-Print archive - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Léo Paillier 6 years ago

_Objective:_ Perform domain-adaptation by adapting several layers using a randomized representation and not just the final layer thus performing alignment of the joint distribution and not just the marginals.

_Dataset:_ [Office](https://cs.stanford.edu/%7Ejhoffman/domainadapt/) and [ImageCLEF-DA1](http://imageclef.org/2014/adaptation).

## Inner-workings:

Basically an improvement on [RevGrad](https://arxiv.org/pdf/1505.07818.pdf) where instead of using the last embedding layer for the discriminator, a bunch of them is used.  
To avoid dimension explosion when using the tensor product of all layers they instead use a randomized multi-linear representation:  
[![screen shot 2017-06-01 at 5 35 46 pm](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)](https://cloud.githubusercontent.com/assets/17261080/26687736/cff20446-46f0-11e7-918e-b60baa10aa67.png)  
Where:

*   d is the dimension of the embedding (they use 1024)
*   R is random matrix for which each element as a null average and variance of 1 (Bernoulli, Gaussian and Uniform are tried)
*   z^l is the l-th layer
*   ⊙ represents the Hadamard product  
    In practice they don't use all layers but just the 3-4 last layers for ResNet and AlexNet.

## Architecture:

[![screen shot 2017-06-01 at 5 34 44 pm](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)](https://cloud.githubusercontent.com/assets/17261080/26687686/acce0d98-46f0-11e7-89d1-15452cbb527e.png)

They use the usual losses for domain adaptation with: - F minimizing the cross-entropy loss for classification and trying to reduce the gap between the distributions (indicated by D). - D maximizing the gap between the distributions.

[![screen shot 2017-06-01 at 5 40 53 pm](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)](https://cloud.githubusercontent.com/assets/17261080/26687936/8575ff70-46f1-11e7-917d-05129ab190b0.png)

## Results:

Improvement on state-of-the-art results for most tasks in the dataset, very easy to implement with any pre-trained network out of the box.

Your comment: