Planning for Autonomous Cars that Leverage Effects on Human Actions on ShortScience.org

doi.org
sci-hub
scholar.google.com

Planning for Autonomous Cars that Leverage Effects on Human Actions
Dorsa Sadigh and Shankar Sastry and Sanjit A. Seshia and Anca D. Dragan
Robotics: Science and Systems XII - 0 via Local CrossRef
Keywords:

Summaries/Notes 1

[link] Summary by Paul Barde 5 years ago

## General Framework
*wording: car = the autonomous car, driver = the other car it is interacting with* 

Builds a model of an **autonomous car's influence over the behavior of an interacting driver** (human or simulated) that the autonomous car can leverage to plan more efficiently. The driver is modeled by the policy that maximizes his defined objective. In brief, a **linear reward function is learned off-line with IRL on human demonstrations** and the modeled policy takes the actions that maximize the driver's return under this reward function (computed with **Model Predictive Control (MPC)**). 
They show that a car planning while using this driver learns ways to **influence the driver either towards specified behaviors** or in ways that **achieve higher payoff**. These results also hold when interacting with human **drivers which are loosely approximated by this driver model**.

I believe the **key to this success lies in a learned reward that generalizes well** (linear function with few, clever hand-designed features) to new states as well as the **use of MPC**: by **focusing on modeling goals it captures a reasonable driver's policy** (since the learned reward is accurate) and does not have to deal with concurrent learning dynamics. On the other hand, IL would try to match behavior instead of goals and might not generalize as well to new (unseen) situations. Additionally, MPC enables to coordinate the car-driver interactions over **extended time horizons** (through planning).

**This shows that leveraging an influence model is promising for communication emergence.** *Parallel with: Promoting influence like Jaques et al. is more effective than hoping agents figure it out by themselves like MADDPG* 

## Motivations
Previous methods use simplistic influence models ("will keep constant velocity"), yet the car behavior influences the driver whether the car is aware of it or not. This simplistic model only leads to "defensive behaviors" where the car avoids disturbing the driver and therefore does not interact with it and yields suboptimal strategies. Additionally, a simplistic interaction model seems to lead to exclusively to functional actions instead of communicative ones. 

## Assumptions
* **Sequential decision making**: the car acts first which forces a driver response which makes the world transition to a new state (could be alleviated by considering influence over time steps: car's action a time t influences driver's action at time t+1)
* Approximates the **human as an optimal planner** (but with limited recursive induction) without uncertainty (deterministic model) etc. 
* The car uses **1st-order recursive induction** ("my action influences the driver but the driver believes that its action won't influence me (that my action is constant no matter what it does)"): *I see this as assuming "defensive" driver and "aggressive" car.*
* **Fully observable** environment (not sure how difficult it would be to wave this)
* Model Predictive Control (MPC) with L-BFGS requires access to the **differentiable transition function**. 

## Method 
Model the interaction as a dynamical system where the car's action influences both the next state and the driver's action.  Uses Theano to backprop through the transition function, and implicit function theorem to derive a symbolic expression of the gradient. 

## Results
* Car leverages driver's model to **influence** the simulated drivers (car has **perfect model**) in **specified ways** (modifying the car's reward function).
* Car leverages driver's model to **influence** the simulated drivers (car has **perfect model**) in order to be **more efficient wrt its own objective**.
* Car leverages driver's model to **influence** the human drivers (car has **imperfect model**) in **specified ways**.

*colors: autonomous car is yellow, driver's car is red*
![](https://i.imgur.com/RK1Gx2P.png)
![](https://i.imgur.com/F77hSfp.png)
![](https://i.imgur.com/3elOG9O.png)
![](https://i.imgur.com/yeSsjiP.png)

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private