End-to-End Instance Segmentation and Counting with Recurrent Attention
Mengye Ren
and
Richard S. Zemel
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG, cs.CV
First published: 2016/05/30 (8 years ago) Abstract: While convolutional neural networks have gained impressive success recently
in solving structured prediction problems such as semantic segmentation, it
remains a challenge to differentiate individual object instances in the scene.
Instance segmentation is very important in a variety of applications, such as
autonomous driving, image captioning, and visual question answering. Techniques
that combine large graphical models with low-level vision have been proposed to
address this problem; however, we propose an end-to-end recurrent neural
network (RNN) architecture with an attention mechanism to model a human-like
counting process, and produce detailed instance segmentations. The network is
jointly trained to sequentially produce regions of interest as well as a
dominant object segmentation within each region. The proposed model achieves
state-of-the-art results on the CVPPP leaf segmentation dataset and KITTI
vehicle segmentation dataset.
This combines the ideas of recurrent attention to perform object detection in an image \cite{1406.6247} for multiple objects \cite{1412.7755} with semantic segmentation \cite{1505.04366}.
Segmenting subregions is to avoid a global resolution bias (the object would take up the majority of pixels) and to allow multiple scales of objects to be segmented.
Here is a video that demos the method described in the paper:
https://youtu.be/BMVDhTjEfBU