Rethinking the Smaller-Norm-Less-Informative Assumption in Channel Pruning of Convolution Layers
Jianbo Ye
and
Xin Lu
and
Zhe Lin
and
James Z. Wang
arXiv e-Print archive - 2018 via Local arXiv
Keywords:
cs.LG
First published: 2018/02/01 (6 years ago) Abstract: Model pruning has become a useful technique that improves the computational
efficiency of deep learning, making it possible to deploy solutions on
resource-limited scenarios. A widely-used practice in relevant work assumes
that a smaller-norm parameter or feature plays a less informative role at the
inference time. In this paper, we propose a channel pruning technique for
accelerating the computations of deep convolutional neural networks (CNNs),
which does not critically rely on this assumption. Instead, it focuses on
direct simplification of the channel-to-channel computation graph of a CNN
without the need of performing a computational difficult and not always useful
task of making high-dimensional tensors of CNN structured sparse. Our approach
takes two stages: the first being to adopt an end-to-end stochastic training
method that eventually forces the outputs of some channels being constant, and
the second being to prune those constant channels from the original neural
network by adjusting the biases of their impacting layers such that the
resulting compact model can be quickly fine-tuned. Our approach is
mathematically appealing from an optimization perspective and easy to
reproduce. We experimented our approach through several image learning
benchmarks and demonstrate its interesting aspects and the competitive
performance.