Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin and Victor Lempitsky
arXiv e-Print archive - 2014 via Local arXiv
Keywords: stat.ML, cs.LG, cs.NE


Summary by Joseph Paul Cohen 4 years ago
I like the interpretation of minimising the classification loss with the constraint that the class conditional marginal (for all $x$ conditioned on the domain source) distribution of the internal representation (learned features) should match each other. This, though, could be better formulated as a soft constraint (as an optimisation problem devised in the paper): $$ \min_{\theta_f,\theta_y}-\mathbf{E}_x [p_{\theta_f,\theta_y}(y|x)] + \mathcal{D}(p_{\theta_f}(f|d=0)||p_{\theta_f}(f|d=1)) $$ where the first term is the standard probabilistic loss, regularised by the distance between the internal distributions. Since the domain label and image datapoint come in pairs, we can always marginalise out the data point and have $p(f|d)=\mathbf{E}_{p(x|d)}[p(f|x)]$. In our case here, p(f|x) is deterministic. The original author uses an "adversarial"-like methodology that introduces a discriminator for domain classification, where a possible choice of the distance metric ($\mathcal{D}$) could be the Jensen Shannon divergence. The adversarial training makes it possible to train the feature extractor like a generator to match the conditionals $p(f|d=0)$ and $p(f|d=1)$ through sampling.

Your comment: allows researchers to publish paper summaries that are voted on and ranked!

Sponsored by: