First published: 2016/12/25 (7 years ago) Abstract: We introduce YOLO9000, a state-of-the-art, real-time object detection system
that can detect over 9000 object categories. First we propose various
improvements to the YOLO detection method, both novel and drawn from prior
work. The improved model, YOLOv2, is state-of-the-art on standard detection
tasks like PASCAL VOC and COCO. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At
40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like
Faster RCNN with ResNet and SSD while still running significantly faster.
Finally we propose a method to jointly train on object detection and
classification. Using this method we train YOLO9000 simultaneously on the COCO
detection dataset and the ImageNet classification dataset. Our joint training
allows YOLO9000 to predict detections for object classes that don't have
labelled detection data. We validate our approach on the ImageNet detection
task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite
only having detection data for 44 of the 200 classes. On the 156 classes not in
COCO, YOLO9000 gets 16.0 mAP. But YOLO can detect more than just 200 classes;
it predicts detections for more than 9000 different object categories. And it
still runs in real-time.
_Objective:_ Train on both classification and detection image to make a better faster and stronger detector.
_Dataset:_ [ImageNet](http://www.image-net.org/), [COCO](http://mscoco.org/) and [WordNet](https://wordnet.princeton.edu/).
## Architecture:
Apart from amelioration such as batch norm or other general tweaking the real improvements come from:
1. Using both a classification dataset and a detection dataset at the same time.
2. Replacing the usual final soft-max layer (which assumes that all labels are mutually exclusive) with a WordTree label hierarchy base on WordNet which enables the network to predict `dog` even if it doesn't know if it's a `Fox Terrier`.
[![screen shot 2017-04-12 at 7 24 28 pm](https://cloud.githubusercontent.com/assets/17261080/24970727/b7abaf02-1fb5-11e7-8b78-2a430a861cbd.png)](https://cloud.githubusercontent.com/assets/17261080/24970727/b7abaf02-1fb5-11e7-8b78-2a430a861cbd.png)
## Results:
State of the art results at full resolution and possibility to lower performance to gain in computation time.
[![screen shot 2017-04-12 at 7 31 26 pm](https://cloud.githubusercontent.com/assets/17261080/24971010/a51556f8-1fb6-11e7-9289-fc277b182686.png)](https://cloud.githubusercontent.com/assets/17261080/24971010/a51556f8-1fb6-11e7-9289-fc277b182686.png)