Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning on ShortScience.org

arxiv.org
scholar.google.com

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Szegedy, Christian and Ioffe, Sergey and Vanhoucke, Vincent
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 4

[link] Summary by Open Review 9 years ago

This paper presents a combination of the inception architecture
with residual networks. This is done by adding a shortcut connection
to each inception module. This can alternatively be seen as a resnet where
the 2 conv layers are replaced by a (slightly modified) inception module.
The paper (claims to) provide results against the hypothesis that adding residual
connections improves training, rather increasing the model size is what makes the difference.

Your comment:

[link] Summary by Martin Thoma 8 years ago

This paper describes the CNN architecture Inception-v4.

They basically update Inception-v3 to use residual connections (see [He et al](http://www.shortscience.org/paper?bibtexKey=journals/corr/HeZRS15)). They also simplified the architecture as they moved from DistBelief to [TensorFlow](https://www.tensorflow.org/).

## Previous papers

* Inception-v1: [Going deeper with Convolutions](http://www.shortscience.org/paper?bibtexKey=journals/corr/SzegedyLJSRAEVR14)

Your comment:

[link] Summary by Alexander Jung 7 years ago

  * Inception v4 is like Inception v3, but
    * Slimmed down, i.e. some parts were simplified
    * One new version with residual connections (Inception-ResNet-v2), one without (Inception-v4)
  * They didn't observe an improved error rate when using residual connections.
  * They did however oberserve that using residual connections decreased their training times.
  * They had to scale down the results of their residual modules (multiply them by a constant ~0.1). Otherwise their networks would die (only produce 0s).
  * Results on ILSVRC 2012 (val set, 144 crops/image):
    * Top-1 Error:
      * Inception-v4: 17.7%
      * Inception-ResNet-v2: 17.8%
    * Top-5 Error (ILSVRC 2012 val set, 144 crops/image):
      * Inception-v4: 3.8%
      * Inception-ResNet-v2: 3.7% 

### Architecture
  * Basic structure of Inception-ResNet-v2 (layers, dimensions):
    * `Image -> Stem -> 5x Module A -> Reduction-A -> 10x Module B -> Reduction B -> 5x Module C -> AveragePooling -> Droput 20% -> Linear, Softmax`
    * `299x299x3 -> 35x35x256 -> 35x35x256 -> 17x17x896 -> 17x17x896 -> 8x8x1792 -> 8x8x1792 -> 1792 -> 1792 -> 1000`
  * Modules A, B, C are very similar.
  * They contain 2 (B, C) or 3 (A) branches.
  * Each branch starts with a 1x1 convolution on the input.
  * All branches merge into one 1x1 convolution (which is then added to the original input, as usually in residual architectures).
  * Module A uses 3x3 convolutions, B 7x1 and 1x7, C 3x1 and 1x3.
  * The reduction modules also contain multiple branches. One has max pooling (3x3 stride 2), the other branches end in convolutions with stride 2.

![Module A](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Inception_v4__module_a.png?raw=true "Module A")
![Module B](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Inception_v4__module_b.png?raw=true "Module B")
![Module C](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Inception_v4__module_c.png?raw=true "Module C")
![Reduction Module A](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Inception_v4__reduction_a.png?raw=true "Reduction Module A")

*From top to bottom: Module A, Module B, Module C, Reduction Module A.*

![Top 5 error](https://raw.githubusercontent.com/aleju/papers/master/neural-nets/images/Inception_v4__top5_error.png?raw=true "Top 5 error")

*Top 5 eror by epoch, models with (red, solid, bottom) and without (green, dashed) residual connections.*

-------------------------

### Rough chapter-wise notes

### Introduction, Related Work
  * Inception v3 was adapted to run on DistBelief. Inception v4 is designed for TensorFlow, which gets rid of some constraints and allows a simplified architecture.
  * Authors don't think that residual connections are inherently needed to train deep nets, but they do speed up the training.
  * History:
    * Inception v1 - Introduced inception blocks
    * Inception v2 - Added Batch Normalization
    * Inception v3 - Factorized the inception blocks further (more submodules)
    * Inception v4 - Adds residual connections

### Architectural Choices
  * Previous architectures were constrained due to memory problems. TensorFlow got rid of that problem.
  * Previous architectures were carefully/conservatively extended. Architectures ended up being quite complicated. This version slims down everything.
  * They had problems with residual networks dieing when they contained more than 1000 filters (per inception module apparently?). They could fix that by multiplying the results of the residual subnetwork (before the element-wise addition) with a constant factor of ~0.1.

### Training methodology
  * Kepler GPUs, TensorFlow, RMSProb (SGD+Momentum apprently performed worse)

### Experimental Results
  * Their residual version of Inception v4 ("Inception-ResNet-v2") seemed to learn faster than the non-residual version.
  * They both peaked out at almost the same value.
  * Top-1 Error (ILSVRC 2012 val set, 144 crops/image):
    * Inception-v4: 17.7%
    * Inception-ResNet-v2: 17.8%
  * Top-5 Error (ILSVRC 2012 val set, 144 crops/image):
    * Inception-v4: 3.8%
    * Inception-ResNet-v2: 3.7%

Your comment:

[link] Summary by 1 year ago

@@3p9N3

This summary is private. You can share this link for others to view it:
https://shortscience.org/paper?bibtexKey=journals/corr/SzegedyIV16&code=MjAyNC0wNS0wMyAyMToyNDoxNw==

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private