ShortScience.org - Making Science Accessible!

Welcome to ShortScience.org!

aclanthology.info
sci-hub
scholar.google.com

Identifying beneficial task relations for multi-task learning in deep neural networks
Søgaard, Anders and Bingel, Joachim
Association for Computational Linguistics EACL (2) - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 8 years ago

The authors investigate the benefit of different task combinations when performing multi-task learning.

https://i.imgur.com/VmD2ioS.png

They experiment with all possible pairs of 10 sequence labeling datasets, switching between the datasets during training. They find that multi-task learning helps more when the main task quickly plateaus while the auxiliary task does not, likely helping the model out of local minima.
There does not seem to be any auxiliary task that would help on all main tasks, but chunking and semantic tagging seem to perform best.

arxiv.org
scholar.google.com

Learning to Compose Words into Sentences with Reinforcement Learning
Yogatama, Dani and Blunsom, Phil and Dyer, Chris and Grefenstette, Edward and Ling, Wang
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 8 years ago

The aim is to have the system discover a method for parsing that would benefit a downstream task.

https://i.imgur.com/q57gGCz.png

They construct a neural shift-reduce parser – as it’s moving through the sentence, it can either shift the word to the stack or reduce two words on top of the stack by combining them. A Tree-LSTM is used for composing the nodes recursively. The whole system is trained using reinforcement learning, based on an objective function of the downstream task. The model learns parse rules that are beneficial for that specific task, either without any prior knowledge of parsing or by initially training it to act as a regular parser.

scholar.google.com

Enriching Word Vectors with Subword Information
Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas
Transactions of the Association for Computational Linguistics - 2017 via Local Bibsonomy
Keywords: fasttext, pretrained, wikipedia, word2vec, vectors

[link] Summary by Marek Rei 8 years ago

They extend skip-grams for word embeddings to use character n-grams. Each word is represented as a bag of character n-grams, 3-6 characters long, plus the word itself. Each of these has their own embedding which gets optimised to predict the surrounding context words using skip-gram optimisation. They evaluate on word similarity and analogy tasks, in different languages, and show improvement on most benchmarks.

aclanthology.info
sci-hub
scholar.google.com

Modelling metaphor with attribute-based semantics
Clark, Stephen and Shutova, Ekaterina and Bulat, Luana
Association for Computational Linguistics EACL (2) - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Marek Rei 8 years ago

They propose using attribute-based vectors for detecting metaphorical word pairs.

https://i.imgur.com/dgDjSvu.png

Traditional embeddings (word2vec and count-based) are mapped to attribute vectors, using a supervised system trained on McRae norms. These vectors for a word pair are then given as input to an SVM classifier and trained to detect metaphorical (black humour) vs literal (black dress) word pairs. They show that using the attribute vectors gives higher F score over using the original vector space.

arxiv.org
arxiv-vanity.com
scholar.google.com

Reinforcement Learning with Unsupervised Auxiliary Tasks
Max Jaderberg and Volodymyr Mnih and Wojciech Marian Czarnecki and Tom Schaul and Joel Z Leibo and David Silver and Koray Kavukcuoglu
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.LG, cs.NE
more

[link] Summary by Marek Rei 8 years ago

They describe a version of reinforcement learning where the system also learns to solve some auxiliary tasks, which helps with the main objective.

https://i.imgur.com/fmTVxvr.png

In addition to normal Q-learning, which predicts the downstream reward, they have the system learning 1) a separate policy for maximally changing the pixels on the screen, 2) maximally activating units in a hidden layer, and 3) predicting the reward at the next step, using biased sampling. They show that this improves learning speed and performance on Atari games and Labyrinth (a Quake-like 3D game).