This paper presents a object detection algorithm that improves mAP on PASCAL VOC dataset by over 20% to previous state-of-the-art. Unlike image classification which take an image or the center part of an image as input, object detection task requires an algorithm to detect bounding boxes of objects in an image. To use the high capacity CNN features in object detection, the proposed algorithm first generates region proposals. CNN features are extracted from those region proposals and are feed to a set of class-specific linear SVMs which tell whether objects are detected in those regions.
The figure below show the object detection system in this paper.
Because the PASCAL VOC dataset is not large enough for training high capacity CNN features, this paper use supervised pre-training on a large auxiliary dataset (ILSVRC 2012). The CNN is then fine-tuned with a portion of the PASCAL VOC dataset.
The following table shows the detection mAP on VOC 2007 test.