Object Detectors Emerge in Deep Scene CNNs
Bolei Zhou
and
Aditya Khosla
and
Agata Lapedriza
and
Aude Oliva
and
Antonio Torralba
arXiv e-Print archive - 2014 via Local arXiv
Keywords:
cs.CV, cs.NE
First published: 2014/12/22 (9 years ago) Abstract: With the success of new computational architectures for visual processing,
such as convolutional neural networks (CNN) and access to image databases with
millions of labeled examples (e.g., ImageNet, Places), the state of the art in
computer vision is advancing rapidly. One important factor for continued
progress is to understand the representations that are learned by the inner
layers of these deep architectures. Here we show that object detectors emerge
from training CNNs to perform scene classification. As scenes are composed of
objects, the CNN for scene classification automatically discovers meaningful
objects detectors, representative of the learned scene categories. With object
detectors emerging as a result of learning to recognize scenes, our work
demonstrates that the same network can perform both scene recognition and
object localization in a single forward-pass, without ever having been
explicitly taught the notion of objects.