[link]
Summary by Shagun Sodhani 8 years ago
## Introduction
* [Link to Paper](http://arxiv.org/pdf/1412.6071v4.pdf)
* Spatial pooling layers are building blocks for Convolutional Neural Networks (CNNs).
* Input to pooling operation is a $N_{in}$ x $N_{in}$ matrix and output is a smaller matrix $N_{out}$ x $N_{out}$.
* Pooling operation divides $N_{in}$ x $N_{in}$ square into $N^2_{out}$ pooling regions $P_{i, j}$.
* $P_{i, j}$ ⊂ $\{1, 2, . . . , N_{in}\}$ $\forall$ $(i, j) \in \{1, . . . , N_{out} \}^2$
## MP2
* Refers to 2x2 max-pooling layer.
* Popular choice for max-pooling operation.
### Advantages of MP2
* Fast.
* Quickly reduces the size of the hidden layer.
* Encodes a degree of invariance with respect to translations and elastic distortions.
### Issues with MP2
* Disjoint nature of pooling regions.
* Since size decreases rapidly, stacks of back-to-back CNNs are needed to build deep networks.
## FMP
* Reduces the spatial size of the image by a factor of *α*, where *α ∈ (1, 2)*.
* Introduces randomness in terms of choice of pooling region.
* Pooling regions can be chosen in a *random* or *pseudorandom* manner.
* Pooling regions can be *disjoint* or *overlapping*.
## Generating Pooling Regions
* Let $a_i$ and $b_i$ be 2 increasing sequences of integers, starting at 1 and ending at $N_{in}$.
* Increments are either 1 or 2.
* For *disjoint regions, $P = [a_{i−1}, a_{i − 1}] × [b_{j−1}, b_{j − 1}]$
* For *overlapping regions, $P = [a_{i−1}, a_i] × [b_{j−1}, b_j 1]$
* Pooling regions can be generated *randomly* by choosing the increment randomly at each step.
* To generate pooling regions in a *peusdorandom* manner, choose $a_i$ = ceil($\alpha | (i+u))$, where $\alpha \in (1, 2)$ with some $u \in (0, 1)$.
* Each FMP layer uses a different pair of sequence.
* An FMP network can be thought of as an ensemble of similar networks, with each different pooling-region configuration defining a different member of the ensemble.
## Observations
* *Random* FMP is good on its own but may underfit when combined with dropout or training data augmentation.
* *Pseudorandom* approach generates more stable pooling regions.
* *Overlapping* FMP performs better than *disjoint* FMP.
## Weakness
* No justification is provided for the observations mentioned above.
* It needs to be seen how performance is affected if the pooling layer in architectures like GoogLeNet.
more
less