Fast RCNN is a proposal detection net for object detection tasks.
##### Input & Output
The input to a Fast RCNN would be the input image and the region proposals (generated using Selective Search). There are 2 outputs of the net, probability map of all possible objects & background ( e.g. 21 classes for Pascal VOC'12) and corresponding bounding box parameters for each object classes.
The Fast RCNN version of any deep net would need 3 major modifications. For e.g. for VGG'16
1. A ROI pooling layer needs to be added after the final maxpool output before fully connected layers
2. The final FC layer is replaced by 2 sibling branched layers - one for giving a softmax output for probability classes, other one is for predicting an encoding of 4 bounding box parameters (x,y, width,height) w.r.t. region proposals
3. Modifying the input 2 take 2 input. images and corresponding prposals
**ROI Pooling layer** - The most notable contribution from the paper is designed to maxpool the features inside a proposed region into a fixed size (for VGG'16 version of FCNN it was 7 x 7) . The intuition behind the layer is make it faster as compared to SPPNets, (which used spatial pyramidal pooling) and RCNN.
The net is trained with dual loss (log loss on probability output + squared error loss on bounding box parameters) .
The results were very impressive, on the VOC '07, '10 & '12 datasets with Fast RCNN outperforming the rest of the nets, in terms of mAp accuracy