On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems
Su, Pei-Hao
and
Gasic, Milica
and
Mrksic, Nikola
and
Rojas-Barahona, Lina Maria
and
Ultes, Stefan
and
Vandyke, David
and
Wen, Tsung-Hsien
and
Young, Steve J.
Association for Computational Linguistics - 2016 via Local Bibsonomy
Keywords:
dblp
The goal is to improve the training process for a spoken dialogue system, more specifically a telephone-based system providing restaurant information for the Cambridge (UK) area. They train a supervised system which tries to predict the success on the current dialogue – if the model is certain about the outcome, the predicted label is used for training the dialogue system; if the model is uncertain, the user is asked to provide a label. Essentially it reduces the amount of annotation that is required, by choosing which examples should be annotated through active learning.
https://i.imgur.com/dWY1EdE.png
The dialogue is mapped to a vector representation using a bidirectional LSTM trained like an autoencoder, and a Gaussian Process is used for modelling dialogue success.