Fast RCNN is a proposal detection net for object detection tasks.
##### Input & Output
The input to a Fast RCNN would be the input image and the region proposals (generated using Selective Search). There are 2 outputs of the net, probability map of all possible objects & background ( e.g. 21 classes for Pascal VOC'12) and corresponding bounding box parameters for each object classes.
##### Architecture
The Fast RCNN version of any deep net would need 3 major modifications. For e.g. for VGG'16
1. A ROI pooling layer needs to be added after the final maxpool output before fully connected layers
2. The final FC layer is replaced by 2 sibling branched layers - one for giving a softmax output for probability classes, other one is for predicting an encoding of 4 bounding box parameters (x,y, width,height) w.r.t. region proposals
3. Modifying the input 2 take 2 input. images and corresponding prposals
**ROI Pooling layer** - The most notable contribution from the paper is designed to maxpool the features inside a proposed region into a fixed size (for VGG'16 version of FCNN it was 7 x 7) . The intuition behind the layer is make it faster as compared to SPPNets, (which used spatial pyramidal pooling) and RCNN.
##### Results
The net is trained with dual loss (log loss on probability output + squared error loss on bounding box parameters) .
The results were very impressive, on the VOC '07, '10 & '12 datasets with Fast RCNN outperforming the rest of the nets, in terms of mAp accuracy