Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
Mark Sandler
and
Andrew Howard
and
Menglong Zhu
and
Andrey Zhmoginov
and
Liang-Chieh Chen
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.CV
First published: 2018/01/13 (6 years ago) Abstract: In this paper we describe a new mobile architecture, MobileNetV2, that
improves the state of the art performance of mobile models on multiple tasks
and benchmarks as well as across a spectrum of different model sizes. We also
describe efficient ways of applying these mobile models to object detection in
a novel framework we call SSDLite. Additionally, we demonstrate how to build
mobile semantic segmentation models through a reduced form of DeepLabv3 which
we call Mobile DeepLabv3.
The MobileNetV2 architecture is based on an inverted residual structure where
the input and output of the residual block are thin bottleneck layers opposite
to traditional residual models which use expanded representations in the input
an MobileNetV2 uses lightweight depthwise convolutions to filter features in
the intermediate expansion layer. Additionally, we find that it is important to
remove non-linearities in the narrow layers in order to maintain
representational power. We demonstrate that this improves performance and
provide an intuition that led to this design. Finally, our approach allows
decoupling of the input/output domains from the expressiveness of the
transformation, which provides a convenient framework for further analysis. We
measure our performance on Imagenet classification, COCO object detection, VOC
image segmentation. We evaluate the trade-offs between accuracy, and number of
operations measured by multiply-adds (MAdd), as well as the number of
parameters
- **Linear Bottlenecks**. Authors show, that even though theoretically activations can be working in linear regime, removing activation from bottlenecks of residual network gives a boost to performance.
-**Inverted residuals**. The shortcut connecting bottleneck perform better than shortcuts connecting the expanded layers
- **SSDLite**. Authors propose to replace convolutions in SSD by depthwise convolutions, significantly reducing both number of parameters and number of calculations, with minor impact on precision.
- **MobileNetV2**. A new architecture, which is basically ResNet with changes mentioned above, outperforms or shows comaparable performance with MobileNetV1, ShuffleNet and NASNet for same number of MACs. Object detection with SSDLite can be ran on ARM core in 200ms. Also a potential of semantic segmentation on mobile devices is chown: a network achieving 75.32% mIOU on PASCAL and only requiring 2.75B MACs.