[link]
The paper proposed a new object detection method to detect novel classes by using Conditional Matching. This detector can be conditioned on either image or text, which means a user can use an image or text to let the model detect the corresponding bounding boxes in the picture. This model has 2 changes compared to other open-vocabulary detectors: 1) Other detectors rely on Region Proposal Network (RPN) which can not cover all the objects in a picture, so it will worsen the performance of detecting novel objects. So in this work, they use CLIP to detect novel objects, it is better than RPN because it uses queries as a reader to read the whole picture, then these queries can cover many objects in the picture. https://i.imgur.com/GqvvSVs.png 2) Other detectors rely on Bipartite Matching to match between class label names and detected bounding boxes. But the downside of Bipartite Matching is that it can not match the novel objects with any label names because the novel objects do not have the labels. So, in this work, they proposed to use Conditional Matching which turns the matching problem into a binary matching problem. By using Conditional Matching, an object can be assigned to a "matched" or "not matched" label. https://i.imgur.com/FjI2iub.png
Your comment:
|