Neural Episodic Control
Badia, Adrià Puigdomènech
International Conference on Machine Learning - 2017 via Local Bibsonomy
_Objective:_ Reduce learning time for [DQN](https://deepmind.com/research/dqn/)-type architectures.
They introduce a new network element, called DND (Differentiable Neural Dictionary) which is basically a dictionary that uses any key (especially embeddings) and computes the value by using kernel between keys. Plus it's differentiable.
They use basically a network in two steps:
1. A classical CNN network that computes and embedding for every image.
2. A DND for all possible actions (controller input) that stores the embedding as key and estimated reward as value.
Also they use a buffer to store all tuples (previous image, action, reward, next image) and for training basic technique is used.
[![screen shot 2017-04-12 at 11 23 32 am](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)](https://cloud.githubusercontent.com/assets/17261080/24951103/92930022-1f73-11e7-97d2-628e2f4b5a33.png)
Clearly improves learning speed but in the end other techniques catchup and it gets outperformed.