The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
Simon Jégou
and
Michal Drozdzal
and
David Vazquez
and
Adriana Romero
and
Yoshua Bengio
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.CV
First published: 2016/11/28 (7 years ago) Abstract: State-of-the-art approaches for semantic image segmentation are built on
Convolutional Neural Networks (CNNs). The typical segmentation architecture is
composed of (a) a downsampling path responsible for extracting coarse semantic
features, followed by (b) an upsampling path trained to recover the input image
resolution at the output of the model and, optionally, (c) a post-processing
module (e.g. Conditional Random Fields) to refine the model predictions.
Recently, a new CNN architecture, Densely Connected Convolutional Networks
(DenseNets), has shown excellent results on image classification tasks. The
idea of DenseNets is based on the observation that if each layer is directly
connected to every other layer in a feed-forward fashion then the network will
be more accurate and easier to train.
In this paper, we extend DenseNets to deal with the problem of semantic
segmentation. We achieve state-of-the-art results on urban scene benchmark
datasets such as CamVid and Gatech, without any further post-processing module
nor pretraining. Moreover, due to smart construction of the model, our approach
has much less parameters than currently published best entries for these
datasets.