Deep Learning with Separable Convolutions on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Deep Learning with Separable Convolutions
François Chollet
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

Summaries/Notes 1

[link] Summary by Qure.ai 8 years ago

Xception Net or Extreme Inception Net brings a new perception of looking at the Inception Nets. Inception Nets, as was first published (as GoogLeNet) consisted of Network-in-Network modules like this
![Inception Modules](http://i.imgur.com/jwYhi8t.png)

The idea behind Inception modules was to look at cross-channel correlations ( via 1x1 convolutions) and spatial correlations (via 3x3 Convolutions). The main concept being  that cross-channel correlations and spatial correlations are sufficiently decoupled that it is preferable not to map them jointly. This idea is the genesis of Xception Net, using depth-wise separable convolution ( convolution which looks into spatial correlations across all channels independently and then uses pointwise convolutions to project to the requisite channel space leveraging inter-channel correlations). Chollet, does a wonderful job of explaining how regular convolution (looking at both channel & spatial correlations simultaneously) and depthwise separable convolution (looking at channel & spatial correlations independently in successive steps) are end points of spectrum with the original Inception Nets lying in between.

![Extreme version of Inception Net](http://i.imgur.com/kylzfIQ.png)
*Though for Xception Net, Chollet uses, depthwise separable layers which perform 3x3 convolutions for each channel and then 1x1 convolutions on the output from 3x3 convolutions (opposite order of operations depicted in image above)*

##### Input
Input for would be images that are used for classification  along with corresponding labels.

##### Architecture
Architecture of Xception Net uses one for VGG-16 with convolution-maxpool blocks replaced by residual blocks of  depthwise separable convolution layers. The architecture looks like this 

![architecture of Xception Net](http://i.imgur.com/9hfdyNA.png)

##### Results
Xception Net was trained using hyperparameters tuned for best performance of Inception V3 Net. And for both internal dataset and ImageNet dataset, Xception outperformed Inception V3. Points to be noted 
- Both Xception & Inception V3 have roughly similar no of parameters (~24 M), hence any improvement in performance can't be attributed to network size
- Xception normally takes slightly lower training time compared to Inception V3, which can be configured to be lower in future

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private