[link]
Summary by CodyWild 6 years ago
This paper is, on the whole, a refreshing jaunt into the applied side of the research word. It isn’t looking to solve a fundamental machine learning problem in some new way, but it does highlight and explore one potential beneficial application of a common and widely used technique: specifically, combining word embeddings with context-free grammars (such as: regular expressions), to make the latter less rigid.
Regular expressions work by specifying specific hardcoded patterns of symbols, and matching against any strings in some search set that match those patterns. They don’t need to specify specific characters - they can work at higher levels of generality, like “uppercase alphabetic character” or “any character”, but they’re still fundamentally hardcoded, in that the designer of the expression needs to create a specification that will affirmatively catch all the desired cases. This can be a particular challenging task when you’re trying to find - for example - all sentences that match the pattern of someone giving someone else a compliment. You might want to match against “I think you’re smart” and also “I think you’re clever”. However, in the normal use of regular expressions, something like this would be nearly impossible to specify, short of writing out every synonym for “intelligent” that you can think of.
The “Embedding Grammars” paper proposes a solution to this problem: instead of enumerating a list of synonyms, simply provide one example term, or, even better, a few examples, and use those term’s word embedding representation to define a “synonym bubble” (my word, not theirs) in continuous space around those examples. This is based on the oft-remarked-upon fact that, because word embedding systems are generally trained to push together words that can be used in similar contexts, closeness in word vector space frequently corresponds to words being synonyms, or close in some other sense. So, if you “match” to any term that is sufficiently nearby to your exemplar terms, you are performing something similar to the task of enumerating all of a term’s syllables. Once this general intuition is in hand, the details of the approach are fairly straightforward: the authors try a few approaches, and find that constructing a bubble of some epsilon around each example’s word vector, and matching to anything inside that bubble, works the best as an approach.
https://i.imgur.com/j9OSNuE.png
Overall, this seems like a clever idea; one imagines that the notion of word embeddings will keep branching out into ever more far-flung application as time goes on. There are reasons to be skeptical of this paper, though. Fundamentally, word embedding space is a “here there be dragons” kind of place: we may be able to observe broad patterns, and might be able to say that “nearby words tend to be synonyms,” but we can’t give any kind of guarantee of that being the case. As an example of this problem, often the nearest thing to an example, after direct synonyms, are direct antonyms, so if you set too high a threshold, you’ll potentially match to words exactly the opposite of what you expect. We are probably still a ways away from systems like this one being broady useful, for this and other reasons, but I do think it’s valuable to try to understand what questions we’d want answered, what features of embedding space we’d want more elucidated, before applications like these would be more stably usable.
more
less