On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems on ShortScience.org

aclweb.org
scholar.google.com

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Su, Pei-Hao and Gasic, Milica and Mrksic, Nikola and Rojas-Barahona, Lina Maria and Ultes, Stefan and Vandyke, David and Wen, Tsung-Hsien and Young, Steve J.
Association for Computational Linguistics - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Marek Rei 7 years ago

The goal is to improve the training process for a spoken dialogue system, more specifically a telephone-based system providing restaurant information for the Cambridge (UK) area. They train a supervised system which tries to predict the success on the current dialogue – if the model is certain about the outcome, the predicted label is used for training the dialogue system; if the model is uncertain, the user is asked to provide a label. Essentially it reduces the amount of annotation that is required, by choosing which examples should be annotated through active learning.

https://i.imgur.com/dWY1EdE.png

The dialogue is mapped to a vector representation using a bidirectional LSTM trained like an autoencoder, and a Gaussian Process is used for modelling dialogue success.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private