End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning
Jason D. Williams
and
Geoffrey Zweig
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CL, cs.AI, cs.LG
First published: 2016/06/03 (8 years ago) Abstract: This paper presents a model for end-to-end learning of task-oriented dialog
systems. The main component of the model is a recurrent neural network (an
LSTM), which maps from raw dialog history directly to a distribution over
system actions. The LSTM automatically infers a representation of dialog
history, which relieves the system developer of much of the manual feature
engineering of dialog state. In addition, the developer can provide software
that expresses business rules and provides access to programmatic APIs,
enabling the LSTM to take actions in the real world on behalf of the user. The
LSTM can be optimized using supervised learning (SL), where a domain expert
provides example dialogs which the LSTM should imitate; or using reinforcement
learning (RL), where the system improves by interacting directly with end
users. Experiments show that SL and RL are complementary: SL alone can derive a
reasonable initial policy from a small number of training dialogs; and starting
RL optimization with a policy trained with SL substantially accelerates the
learning rate of RL.
TLDR; The author present and end-2-end dialog system that consists of an LSTM, action templates, an entity extraction system, and custom code for declaring business rules. They test the systme on a toy task where the goal is to call a person from an address book. They train the system on 21 dialogs using Supervised Learning, and then optimize it using Reinforcement Learning, achieving 70% task completion rates.
#### Key Points
- Task: User asks to call person. Action: Find in address book and place call
- 21 example dialogs
- Several hundred lines of Python code to block certain actions
- External entity recognition API
- Hand-crafted features as input to the LSTM. Hand-crafted action template.
- RNN maps from sequence to action template, First pre-train LSTM to reproduce dialogs using Supervised Learning, then train using RL / policy gradient
- The system doesn't generate text, it picks a template
#### Notes
- I wonder how well the system would generalize to a task that has a larger action space and more varied conversations. The 21 provided dialogs cover a lot of the taks space already. Much harder to do that in larger spaces.
- I wouldn't call this approach end-to-end ;)