Prioritized Experience Replay on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Prioritized Experience Replay
Tom Schaul and John Quan and Ioannis Antonoglou and David Silver
arXiv e-Print archive - 2015 via Local arXiv
Keywords: cs.LG
more

Summaries/Notes 1

[link] Summary by Tianxiao Zhao 6 years ago

this paper: develop a framework to replay important transitions more frequently -> learn efficienty

prior work: uniformly sample a replay memory to get experience transitions
 
evaluate: DQN + PER outperform DQN on 41 out of 49 Atari games

## Introduction

**issues with online RL:** (solution: experience replay) 

1. strongly correlated updates that break the i.i.d. assumption
2. rapid forgetting of rare experiences that could be useful later

**key idea:** 

more frequently replay transitions with high expected learning progress, as measured by the magnitude of their temporal-difference (TD) error

**issues with prioritization:**

1. loss of diversity -> alleviate with stochastic prioritization
2. introduce bias -> correct with importance sampling

## Prioritized Replay

**criterion:**

- the amount the RL agent can learn from a transition in its current state (expected learning progress) -> not directly accessible
- proxy: the magnitude of a transition’s TD error ~= how far the value is from its next-step bootstrap estimate

**stochastic sampling:**

$$P(i)=\frac{p_i^\alpha}{\sum_k p_k^\alpha}$$

*p_i* > 0: priority of transition *i*; 0 <= *alpha* <= 1 determines how much prioritization is used.

*two variants:*

1. proportional prioritization: *p_i* = abs(TD\_error\_i) + epsilon (small positive constant to avoid zero prob)
2. rank-based prioritization: *p_i* = 1/rank(i); **more robust as it is insensitive to outliers**

https://i.imgur.com/T8je5wj.png

**importance sampling:**

IS weights: 

$$w_i = \left(\frac{1}{N} \cdot \frac{1}{P(i)}\right)^\beta $$

- weights can be folded into the Q-learning update by using $w_i*\delta_i$ instead of $\delta_i$
- weights normalized by $\frac{1}{\max w_i}$

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private