Main idea: planning is 'trying things in your head' using an internal model of the world
#### Diagram
https://i.imgur.com/vDAobu1.png
#### Generic algorithm
- step 1-3: standard reinforcement learning agent
- step 4: learning of domain knowledge - action model
- step 5: RL from hypothetical, model-generated experiences - planning
https://i.imgur.com/G6dZ00F.png
#### Action model
input: state, action; output: immediate resulting state and reward
search control: how to select hypothetical state and action
## Potential problems
1. reliance on supervised learning
2. hierarchical planning
3. ambiguous and hidden state
4. ensuring variety in action
5. taskability
6. incorporation of prior knowledge