Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
Mark Sandler and Andrew Howard and Menglong Zhu and Andrey Zhmoginov and Liang-Chieh Chen
arXiv e-Print archive - 2018 via Local arXiv
Keywords: cs.CV
more

Summaries/Notes 2

[link] Summary by CodyWild 2 years ago

This work expands on prior techniques for designing models that can both be stored using fewer parameters, and also execute using fewer operations and less memory, both of which are key desiderata for having trained machine learning models be usable on phones and other personal devices. 

The main contribution of the original MobileNets paper was to introduce the idea of using "factored" decompositions of Depthwise and Pointwise convolutions, which separate the procedures of "pull information from a spatial range" and "mix information across channels" into two distinct steps. In this paper, they continue to use this basic Depthwise infrastructure, but also add a new design element: the inverted-residual linear bottleneck. 

The reasoning behind this new layer type comes from the observation that, often, the set of relevant points in a high-dimensional space (such as the 'per-pixel' activations inside a conv net) actually lives on a lower-dimensional manifold. So, theoretically, and naively, one could just try to use lower dimensional internal representations to map the dimensionality of that assumed manifold. However, the authors argue that ReLU non-linearities kill information (because of the region where all inputs are mapped to zero), and so having layers contain only the number of dimensions needed for the manifold would mean that you end up with too-few dimensions after the ReLU information loss. However, you need to have non-linearities somewhere in the network in order to be able to learn complex, non-linear functions. 

So, the authors suggest a method to mostly use smaller-dimensional representations internally, but still maintain ReLus and the network's needed complexity. 

https://i.imgur.com/pN4d9Wi.png

- A lower-dimensional output is "projected up" into a higher dimensional output
- A ReLu is applied on this higher-dimensional layer
- That layer is then projected down into a smaller-dimensional layer, which uses a linear activation to avoid information loss
- A residual connection between the lower-dimensional output at the beginning and end of the expansion

This way, we still maintain the network's non-linearity, but also replace some of the network's higher-dimensional layers with lower-dimensional linear ones

Your comment:

[link] Summary by evgeniizh 6 years ago

- **Linear Bottlenecks**. Authors show, that even though theoretically activations can be working in linear regime, removing activation from bottlenecks of residual network gives a boost to performance.
-**Inverted residuals**. The shortcut connecting bottleneck perform better than shortcuts connecting the expanded layers
- **SSDLite**.  Authors propose to replace convolutions in SSD by depthwise convolutions, significantly reducing both number of parameters and number of calculations, with minor impact on precision. 
- **MobileNetV2**. A new architecture, which is basically ResNet with changes mentioned above, outperforms or shows comaparable performance with MobileNetV1, ShuffleNet and NASNet for same number of MACs. Object detection with SSDLite can be ran on ARM core in 200ms. Also a potential of semantic segmentation on mobile devices is chown: a network achieving 75.32% mIOU  on PASCAL and  only requiring 2.75B MACs.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private