Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
and
Kai Chen
and
Greg Corrado
and
Jeffrey Dean
arXiv e-Print archive - 2013 via Local arXiv
Keywords:
cs.CL
First published: 2013/01/16 (11 years ago) Abstract: We propose two novel model architectures for computing continuous vector
representations of words from very large data sets. The quality of these
representations is measured in a word similarity task, and the results are
compared to the previously best performing techniques based on different types
of neural networks. We observe large improvements in accuracy at much lower
computational cost, i.e. it takes less than a day to learn high quality word
vectors from a 1.6 billion words data set. Furthermore, we show that these
vectors provide state-of-the-art performance on our test set for measuring
syntactic and semantic word similarities.