Fully Convolutional Networks for Semantic Segmentation
Jonathan Long
and
Evan Shelhamer
and
Trevor Darrell
arXiv e-Print archive - 2014 via Local arXiv
Keywords:
cs.CV
First published: 2014/11/14 (10 years ago) Abstract: Convolutional networks are powerful visual models that yield hierarchies of
features. We show that convolutional networks by themselves, trained
end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic
segmentation. Our key insight is to build "fully convolutional" networks that
take input of arbitrary size and produce correspondingly-sized output with
efficient inference and learning. We define and detail the space of fully
convolutional networks, explain their application to spatially dense prediction
tasks, and draw connections to prior models. We adapt contemporary
classification networks (AlexNet, the VGG net, and GoogLeNet) into fully
convolutional networks and transfer their learned representations by
fine-tuning to the segmentation task. We then define a novel architecture that
combines semantic information from a deep, coarse layer with appearance
information from a shallow, fine layer to produce accurate and detailed
segmentations. Our fully convolutional network achieves state-of-the-art
segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012),
NYUDv2, and SIFT Flow, while inference takes one third of a second for a
typical image.