Neural Episodic Control on ShortScience.org

proceedings.mlr.press
scholar.google.com

Neural Episodic Control
Pritzel, Alexander and Uria, Benigno and Srinivasan, Sriram and Badia, Adrià Puigdomènech and Vinyals, Oriol and Hassabis, Demis and Wierstra, Daan and Blundell, Charles
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Léo Paillier 7 years ago

 _Objective:_ Reduce learning time for [DQN](https://deepmind.com/research/dqn/)-type architectures.

They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.

## Architecture:

They use basically a network in two steps:

1.  A classical CNN network that computes and embedding for every image.
2.  A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.

Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.

[![screen shot 2017-04-12 at 11 23 32 am](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)

## Results:

Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private