José Manuel Rodríguez Sotelo's profile

papers.nips.cc
scholar.google.com

Recurrent Models of Visual Attention
Mnih, Volodymyr and Heess, Nicolas and Graves, Alex and Kavukcuoglu, Koray
Neural Information Processing Systems Conference - 2014 via Local Bibsonomy
Keywords: dblp

[link] Summary by José Manuel Rodríguez Sotelo 9 years ago

The main contribution of this paper is introducing a (recurrent) visual attention model (RAM). Convolutional networks (CNs) seem to do a great job in computer vision tasks. Unfortunately, the amount of computation they require grows (at least) linearly in the size of the image. The RAM surges as an alternative that performs as well as CNs, but where the amount of computation can be controlled independently of the image size.

#### What is RAM?
A model that describes a sequential decision process of a goal-directed agent interacting with a visual environment. It involves deciding where to look in a constrained visual environment and taking decisions to maximize a reward. It uses a recurrent neural network to combine information from the past to decide its future actions.

#### What do we gain?
The attention mechanism takes care of deciding the parts of the image that are worth looking to solve the task. Therefore, it will ignore clutter. In addition, the amount of computation can be decided independently of the image sizes. Furthermore, this could also be directly applied to variable size images as well as detecting multiple objects in one image.

#### What follows?
An extension that may be worth exploring is whether the attention mechanism can be made differentiable. This might be already done in other papers.

#### Like:
* Can be used for analyzing videos and playing games.
Useful in cluttered environments.

#### Dislike:
* The model is non-differentiable.

jmlr.org
scholar.google.com

DRAW: A Recurrent Neural Network For Image Generation
Gregor, Karol and Danihelka, Ivo and Graves, Alex and Rezende, Danilo Jimenez and Wierstra, Daan
International Conference on Machine Learning - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by José Manuel Rodríguez Sotelo 9 years ago

The paper introduces a sequential variational auto-encoder that generates complex images iteratively. The authors also introduce a new spatial attention mechanism that allows the model to focus on small subsets of the image. This new approach for image generation produces images that can’t be distinguished from the training data.

#### What is DRAW:
The deep recurrent attention writer (DRAW) model has two differences with respect to other variational auto-encoders. First, the encoder and the decoder are recurrent networks. Second, it includes an attention mechanism that restricts the input region observed by the encoder and the output region observed by the decoder.

#### What do we gain?
The resulting images are greatly improved by allowing a conditional and sequential generation. In addition, the spatial attention mechanism can be used in other contexts to solve the “Where to look?” problem.

#### What follows?
A possible extension to this model would be to use a convolutional architecture in the encoder or the decoder. Although this might be less useful since we are already restricting the input of the network.

#### Like:
* As observed in the samples generated by the model, the attention mechanism works effectively by reconstructing images in a local way.
* The attention model is fully differentiable.

#### Dislike:
* I think a better exposition of the attention mechanism would improve this paper.

jmlr.org
scholar.google.com

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Ioffe, Sergey and Szegedy, Christian
International Conference on Machine Learning - 2015 via Local Bibsonomy
Keywords: dblp

[link] Summary by José Manuel Rodríguez Sotelo 9 years ago

The main contribution of this paper is introducing a new transformation that the authors call Batch Normalization (BN). The need for BN comes from the fact that during the training of deep neural networks (DNNs) the distribution of each layer’s input change. This phenomenon is called internal covariate shift (ICS).

#### What is BN?
Normalize each (scalar) feature independently with respect to the mean and variance of the mini batch. Scale and shift the normalized values with two new parameters (per activation) that will be learned. The BN consists of making normalization part of the model architecture.

#### What do we gain?
According to the author, the use of BN provides a great speed up in the training of DNNs. In particular, the gains are greater when it is combined with higher learning rates. In addition, BN works as a regularizer for the model which allows to use less dropout or less L2 normalization. Furthermore, since the distribution of the inputs is normalized, it also allows to use sigmoids as activation functions without the saturation problem.

#### What follows?
This seems to be specially promising for training recurrent neural networks (RNNs). The vanishing and exploding gradient problems \cite{journals/tnn/BengioSF94} have their origin in the iteration of transformation that scale up or down the activations in certain directions (eigenvectors). It seems that this regularization would be specially useful in this context since this would allow the gradient to flow more easily. When we unroll the RNNs, we usually have ultra deep networks.

#### Like
* Simple idea that seems to improve training.
* Makes training faster.
* Simple to implement. Probably.
* You can be less careful with initialization.

#### Dislike
* Does not work with stochastic gradient descent (minibatch size = 1).
* This could reduce the parallelism of the algorithm since now all the examples in a mini batch are tied.
* Results on ensemble of networks for ImageNet makes it harder to evaluate the relevance of BN by itself. (Although they do mention the performance of a single model).

José Manuel Rodríguez Sotelo

sciscore: 3.667