Open-Vocabulary DETR with Conditional Matching on ShortScience.org

arxiv.org
arxiv-vanity.com
scholar.google.com

Open-Vocabulary DETR with Conditional Matching
Yuhang Zang and Wei Li and Kaiyang Zhou and Chen Huang and Chen Change Loy
arXiv e-Print archive - 2022 via Local arXiv
Keywords: cs.CV, cs.AI
more

Summaries/Notes 1

[link] Summary by ngthanhtinqn 2 years ago

The paper proposed a new object detection method to detect novel classes by using Conditional Matching. This detector can be conditioned on either image or text, which means a user can use an image or text to let the model detect the corresponding bounding boxes in the picture.

This model has 2 changes compared to other open-vocabulary detectors:

1) Other detectors rely on Region Proposal Network (RPN) which can not cover all the objects in a picture, so it will worsen the performance of detecting novel objects. So in this work, they use CLIP to detect novel objects, it is better than RPN because it uses queries as a reader to read the whole picture, then these queries can cover many objects in the picture.

https://i.imgur.com/GqvvSVs.png

2) Other detectors rely on Bipartite Matching to match between class label names and detected bounding boxes. But the downside of Bipartite Matching is that it can not match the novel objects with any label names because the novel objects do not have the labels. So, in this work, they proposed to use Conditional Matching which turns the matching problem into a binary matching problem. By using Conditional Matching, an object can be assigned to a "matched" or "not matched" label.

https://i.imgur.com/FjI2iub.png

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private