Cooperative Inverse Reinforcement Learning
Hadfield-Menell, Dylan
and
Dragan, Anca
and
Abbeel, Pieter
and
Russell, Stuart J.
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords:
dblp
In the future, AI and people will work together; hence, we must concern ourselves with ensuring that AI will have interests aligned with our own.
The authors suggest that it is in our best interests to find a solution to the "value-alignment problem". As recently pointed out by Ian Goodfellow, however,
[this may not always be a good idea](https://www.quora.com/When-do-you-expect-AI-safety-to-become-a-serious-issue).
Cooperative Inverse Reinforcement Learning (CIRL) is a formulation of a cooperative, partial information game between a human and a robot. Both share a reward
function, but the robot does not initially know what it is. One of the key departures from classical Inverse Reinforcement Learning
is that the teacher, which in this case is the human, is not assumed to act optimally. Rather, it is shown that sub-optimal actions
on the part of the human can result in the robot learning a better reward function. The structure of the CIRL formulation is such that it should encourage the
human to not attempt to teach by demonstration in a way that greedily maximizes immediate reward. Rather, the human learns how to "best respond" to the robot.
CIRL can be formulated as a dec-POMDP, and reduced to a single-agent POMDP. The authors solved a 2D navigation task with CIRL to demonstrate the inferiority of having the human follow a "demonstration-by-expert" policy as opposed to a "best-response" policy.