Neural Episodic Control
Pritzel, Alexander
and
Uria, Benigno
and
Srinivasan, Sriram
and
Badia, Adrià Puigdomènech
and
Vinyals, Oriol
and
Hassabis, Demis
and
Wierstra, Daan
and
Blundell, Charles
International Conference on Machine Learning - 2017 via Local Bibsonomy
Keywords:
dblp
_Objective:_ Reduce learning time for [DQN](https://deepmind.com/research/dqn/)-type architectures.
They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.
## Architecture:
They use basically a network in two steps:
1. A classical CNN network that computes and embedding for every image.
2. A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.
Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.
[![screen shot 2017-04-12 at 11 23 32 am](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)
## Results:
Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.