Improve on [R-CNN](https://arxiv.org/abs/1311.2524) and [SPPnet](https://arxiv.org/abs/1406.4729) with easier and faster training.
Region-based Convolutional Neural Network (R-CNN), basically takes as input and image and several possibles objects (corresponding to Region of Interest) and score each of them.
## Architecture:
The feature map is computed for the whole image and then for each region of interest a new fixed-length feature vector is computed using max-pooling. From it two predictions are made for classification and bounding-box offsets.
[![screen shot 2017-04-14 at 12 46 38 pm](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)](https://cloud.githubusercontent.com/assets/17261080/25041460/6e7cba40-2110-11e7-8650-faae2a6b0a92.png)
## Results:
By sharing computation for RoIs of the same image and allowing simple SGD training it really improves performance training although at testing it's still not as fast as YOLO9000.