[link]
#### Mask R-CNN framework for instance segmentation ### Goal: * classify individual objects * localize each using a bounding box, * semantic segmentation https://i.imgur.com/XfBRa5O.png * classify each pixel into a fixed set of categories without differentiating object instances. * extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression. * FCN applied to each RoI, predicting a segmentation mask in a pixel-to-pixel manner 1. RoIAlign: * Used to fix the misalignment that faithfully preserves exact spatial locations * improves mask accuracy by relative 10% to 50%, fast speed 2. Decouple mask and class prediction: * predict a binary mask for each class independently, without competition among classes History: * RCNN: The Region-based CNN (R-CNN) approach to bounding-box object detection * Fast RCNN: Speeding up and Simplifying R-CNN * RoI (Region of Interest) Pooling * jointly train the CNN, classifier, and bounding box regressor in a single model * Faster R-CNN - Speeding Up Region Proposal * reuse the same CNN results for region proposals instead of running a separate selective search algorithm it can be done by Region Proposal Network * only one CNN needs to be trained Related Work * Instance Segmentation: “fully convolutional instance segmentation” (FCIS) * Faster R-CNN: * Region Proposal Network (RPN), proposes candidate object bounding boxes * Fast R-CNN [12], extracts features using RoIPool from each candidate box and performs classification and bounding-box regression * Mask R-CNN: Mask R-CNN adopts the same two-stage of Faster RCNN And has third stage i.e binary mask for each RoI * Mask Representation: pixel to pixel representation of image done by RoIAlign layer (7X7) #### Network Architecture * convolutional backbone architecture used for feature extraction over an entire image (ResNet-50-C4, FPN) * network head for bounding-box recognition (classification and regression) and mask prediction https://i.imgur.com/pUvKdmx.png #### Training: * Images resized:800 pixel * mini-batch : 2 images per GPU * N : 64 * train: on 8 GPUs for 160k iterations * learning : 0.02 * train images: 80K * val images: 35K * minival:5K https://i.imgur.com/6ZLpewi.png https://i.imgur.com/5o3um0Y.png
Your comment:
|