The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation
Simon Jégou and Michal Drozdzal and David Vazquez and Adriana Romero and Yoshua Bengio
arXiv e-Print archive - 2016 via Local arXiv
Keywords: cs.CV
more

Summaries/Notes 1

[link] Summary by Qure.ai 7 years ago

In the paper, authors Bengio et al. , use the DenseNet for semantic segmentation. DenseNets iteratively concatenates input feature maps to output feature maps. The biggest contribution was the use of a novel upsampling path - given conventional upsampling would've caused severe memory cruch.

#### Background

All fully convolutional semantic segmentation nets generally follow a conventional path - a downsampling path which acts as feature extractor, an upsampling path that restores the locational information of every feature extracted in the downsampling path.

As opposed to Residual Nets (where input feature maps are added to the output) , in DenseNets,the output is concatenated to input which has some interesting implications:
- DenseNets are efficient in the parameter usage, since all the feature maps are reused
- DenseNets perform deep supervision thanks to short path to all feature maps in the architecture

Using DenseNets for segmentation though had an issue with upsampling in the conventional way of concatenating feature maps through skip connections as feature maps could easily go beyond 1-1.5 K. So Bengio et al. suggests a novel way - wherein only feature maps produced in the last Dense layer are only upsampled and not the entire feature maps. Post upsampling, the output is concatenated with feature maps of same resolution from downsampling path through skip connection. That way, the information lost during pooling in the downsampling path can be recovered.

#### Methodology & Architecture

In the downsampling path, the input is concatenated with the output of a dense block, whereas for upsampling the output of dense block is upsampled (without concatenating it with the input) and then concatenated with the same resolution output of downsampling path.

Here's the overall architecture
![](https://i.imgur.com/tqsPj72.png)

Here's how a Dense Block looks like
![](https://i.imgur.com/MMqosoj.png)

#### Results
The 103 Conv layer based DenseNet (FC-DenseNet103) performed better than shallower networks when compared on CamVid dataset. Though the FC-DenseNets were not pre-trained or used any post-processing like CRF or temporal smoothening etc. When comparing to other nets FC-DenseNet architectures achieve state-of-the-art, improving upon models with 10 times more parameters. It is also worth mentioning that small model FC-DenseNet56 already outperforms popular architectures with at least 100 times more parameters.

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private