Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics on ShortScience.org

aclweb.org
scholar.google.com

Comparing Data Sources and Architectures for Deep Visual Representation Learning in Semantics
Kiela, Douwe and Vero, Anita Lilla and Clark, Stephen
Empirical Methods on Natural Language Processing (EMNLP) - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Marek Rei 8 years ago

The authors compare different image recognition models and image data sources for multimodal word representation learning.

https://i.imgur.com/iHwCSks.png

Image recognition models used for vector generation

Experiments are performed on SimLex-999 (similarity) and MEN (relatedness). The performance of different models (AlexNet, GoogLeNet, VGGNet) is found to be quite similar, with VGGNet performing slightly better at the cost of requiring more computation. Using search engines for image sources gives good coverage; ImageNet performs quite well with VGGNet; ESP Game dataset gave the lowest performance. Combining visual and linguistic vectors was found to be beneficial on both English and Italian.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private