Hybrid computing using a neural network with dynamic external memory on ShortScience.org

dx.doi.org
sci-hub
scholar.google.com

Hybrid computing using a neural network with dynamic external memory
Alex Graves and Greg Wayne and Malcolm Reynolds and Tim Harley and Ivo Danihelka and Agnieszka Grabska-Barwińska and Sergio Gómez Colmenarejo and Edward Grefenstette and Tiago Ramalho and John Agapiou and Adrià Puigdomènech Badia and Karl Moritz Hermann and Yori Zwols and Georg Ostrovski and Adam Cain and Helen King and Christopher Summerfield and Phil Blunsom and Koray Kavukcuoglu and Demis Hassabis
Nature - 2016 via Local CrossRef
Keywords:

Summaries/Notes 2

[link] Summary by MarvMind 9 years ago

The paper introduces an approach to model external memory in a differentiable way. Memory is modeled as $N \times W$ matrix. The memory has $N$ independent storage location each being able to store a datum of length $W$.

### DNC Architecture

A controller network is trained to utilize the memory. Memory access is modeled in cycles. At each time-step $t$ the network emits read and write command as part of its output. The commands are than processed, read data is given to the network at time-step $t+1$ as part of its input. Any deep learning network architecture can be used as Controller network (e.g. standard feed-forward CNN). The paper utilizes a deep LSTM architecture is used as controller network.

https://storage.googleapis.com/deepmind-live-cms/images/dnc_figure1.width-1500.png

### Memory Interaction

The control commands allow the network to interact with the data in three different ways:
1. Content Lookup: access is controlled by how closely a given location matches a predefined key.
2. Sequential read access: For each read vector $v$ the network receives as input at time $t$ it can access the data which was written directly after $v$.
3. Usage based write access: A "usage" vector $u \in [0,1]^N$ models the importance of each location. The network can choose to write data based on the lowest usage level. $u$ can be decreased ("erased memory") at each time-step.

#### Differentiable Operations

All memory operations are modeled in a differentiable way, so that the entire model can be trained end-to-end using gradient descend. To do this all read commands are mapped to a soft read vector $w^r \in [0,1]^N$, such that $\sum^N_{i=1} w^r_i = 1$. The read vector $r$ is then defined as weighted sum over the rows of the memory: $r = \sum^N_{i=1} M[i, \cdot] w^r_i = 1$.

### Experiments

The network was trained to perform a variety of different, memory intensive tasks. The results show, that the network is able to learn to take advantage of the external memory.

https://www.youtube.com/watch?v=B9U8sI7TcMY

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private