[link]
Summary by RyanDsouza 6 years ago
The problem statement this paper tries to address is that the training set is distinguished by a large imbalance between the number of foreground examples and background examples-To make the point concrete cases like sliding window object detectors like deformable parts model, the imbalance may be as extreme as 100,000 background examples to one annotated foreground example.
Before i proceed to give you the details of Hard Example mining, i just want to note that HEM in its essence is mostly while training you sort your losses and train your model on the most difficult examples which mostly means the ones with the most loss.(An extension to this can be found in the paper Focal Loss). This is a simple but powerful technique.
So taking this as out background,The authors propose a simple but effective method to train an Fast-RCNN.
Their approach is as follows,
1. For an input image at SGD iteration t, they first compute a convolution feature map using the conv-Network
2. The ROI Network uses this feature map and all the input ROI's to do a forward pass
3. Hard examples are sorted by loss and taking the B/N examples for which the current network performs worse.(Here B is batch size and N is Number of examples)
4. While doing this, The researchers notice that Co-located ROI's with high overlap are likely to have co-related losses. Also If you notice Overlapping ROI's will project onto the mostly the same region in the Conv-feature map because the feature map is a denser/smaller representation of the feature map.So this might lead to loss double counting.To deal with this They use standard Non-Maximum Supression.
5. Now how NMS works here is, It iteratively selects the ROI with the highest loss and removes all lower loss ROI's that have high overlap with the selected region.Here they use a IOU threshold of 0.7
more
less