#### Mask R-CNN framework for instance segmentation
### Goal:
* classify individual objects
* localize each using a bounding box,
* semantic segmentation
https://i.imgur.com/XfBRa5O.png
* classify each pixel into a fixed set of categories without differentiating object instances.
* extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression.
* FCN applied to each RoI, predicting a segmentation mask in a pixel-to-pixel manner
1. RoIAlign:
* Used to fix the misalignment that faithfully preserves exact spatial locations
* improves mask accuracy by relative 10% to 50%, fast speed
2. Decouple mask and class prediction:
* predict a binary mask for each class independently, without competition among classes
History:
* RCNN: The Region-based CNN (R-CNN) approach to bounding-box object detection
* Fast RCNN: Speeding up and Simplifying R-CNN
* RoI (Region of Interest) Pooling
* jointly train the CNN, classifier, and bounding box regressor in a single model
* Faster R-CNN - Speeding Up Region Proposal
* reuse the same CNN results for region proposals instead of running a separate selective search algorithm it can be done by Region Proposal Network
* only one CNN needs to be trained
Related Work
* Instance Segmentation: “fully convolutional instance segmentation” (FCIS)
* Faster R-CNN: * Region Proposal Network (RPN), proposes candidate object bounding boxes
* Fast R-CNN [12], extracts features using RoIPool from each candidate box and performs classification and bounding-box regression
* Mask R-CNN: Mask R-CNN adopts the same two-stage of Faster RCNN And has third stage i.e binary mask for each RoI
* Mask Representation: pixel to pixel representation of image done by RoIAlign layer (7X7)
#### Network Architecture
* convolutional backbone architecture used for feature extraction over an entire image (ResNet-50-C4, FPN)
* network head for bounding-box recognition (classification and regression) and mask prediction
https://i.imgur.com/pUvKdmx.png
#### Training:
* Images resized:800 pixel
* mini-batch : 2 images per GPU
* N : 64
* train: on 8 GPUs for 160k iterations
* learning : 0.02
* train images: 80K
* val images: 35K
* minival:5K
https://i.imgur.com/6ZLpewi.png
https://i.imgur.com/5o3um0Y.png