A Comparison of Word Embeddings for the Biomedical Natural Language Processing on ShortScience.org

arxiv.org
scholar.google.com

A Comparison of Word Embeddings for the Biomedical Natural Language Processing
Yanshan Wang and Sijia Liu and Naveed Afzal and Majid Rastegar-Mojarad and Liwei Wang and Feichen Shen and Paul Kingsbury and Hongfang Liu
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.IR
more

Summaries/Notes 1

[link] Summary by Joseph Paul Cohen 7 years ago

This paper demonstrates that Word2Vec \cite{1301.3781} can extract relationships between words and produce latent representations useful for medical data. They explore this model on different datasets which yield different relationships between words.

https://i.imgur.com/hSA61Zw.png

The Word2Vec model works like an autoencoder that predicts the context of a word. The context of a word is composed of the surrounding words as shown below. Given the word in the center the neighboring words are predicted through a bottleneck in the autoencoder. A word has many contexts in a corpus so the model can never have 0 error. The model must minimize the reconstruction which is how it learns the latent representation.

https://i.imgur.com/EMtjTHn.png

Subjectively we can observe the relationship between word vectors:

https://i.imgur.com/8C9EVq1.png

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private