Residual Networks of Residual Networks: Multilevel Residual Networks
Tony X. Han
arXiv e-Print archive - 2016 via Local arXiv
First published: 2016/08/09 (4 years ago) Abstract: Residual networks family with hundreds or even thousands of layers dominate
major image recognition tasks, but building a network by simply stacking
residual blocks inevitably limits its optimization ability. This paper proposes
a novel residual-network architecture, Residual networks of Residual networks
(RoR), to dig the optimization ability of residual networks. RoR substitutes
optimizing residual mapping of residual mapping for optimizing original
residual mapping, in particular, adding level-wise shortcut connections upon
original residual networks, to promote the learning capability of residual
networks. More importantly, RoR can be applied to various kinds of residual
networks (Pre-ResNets and WRN) and significantly boost their performance. Our
experiments demonstrate the effectiveness and versatility of RoR, where it
achieves the best performance in all residual-network-like structures. Our
RoR-3-WRN58-4 models achieve new state-of-the-art results on CIFAR-10,
CIFAR-100 and SVHN, with test errors 3.77%, 19.73% and 1.59% respectively.
These results outperform 1001-layer Pre-ResNets by 18.4% on CIFAR-10 and 13.1%
This paper introduces a modification to the ResNets architecture with multi-level shortcut connections (shortcut from input to pre-final layer as level 1, shortcut over each residual block group as level 2, etc) as opposed to single-level shortcut connections in prior work on ResNets. The authors perform experiments with multi-level shortcut connections on regular ResNets, ResNets with pre-activations and Wide ResNets. Combined with drop-path regularization via stochastic depth and exploration over optimal shortcut level number and optimal depth/width ratio to avoid vanishing gradients and overfitting, this architecture achieves state-of-the-art error rates on CIFAR-10 (3.77%), CIFAR-100 (19.73%) and SVHN (1.59%).
- Fairly exhaustive set of experiments over
- Shortcut level numbers.
- Identity mapping types: 1) zero-padding shortcuts, 2) 1x1 convolutions for projections and others identity, and 3) all 1x1 convolutions.
- Residual block size (2 or 3 3x3 convolutional layers).
- Depths (110, 164, 182, 218) and widths for both ResNets and Pre-ResNets.