Deep Image Retrieval: Learning global representations for image search on ShortScience.org

arxiv.org
scholar.google.com

Deep Image Retrieval: Learning global representations for image search
Gordo, Albert and Almazán, Jon and Revaud, Jérome and Larlus, Diane
arXiv e-Print archive - 2016 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Yin Xia 8 years ago

**Contributions:**
- Triplet ranking loss, implemented in three-stream Siamese network
-  Integrate region proposal network in system. All operations are derivative, making the system end-to-end trainable. 
- Proposed dataset cleaning method, which is critical for performance boost.
- Performance surpasses previous global descriptors and most of local based descriptors in Landmarks dataset.

**Training:**
- Sample triplets, triplet hinge loss:
 $L(I_q, I^+, I^-)=max(0, m+q^Td^- - q^Td^+)$
- Since only convolutional layers are used in CNN, and aggregation does not require a fixed input size, full image resolution could be used.

**Network data flow:**
- Use convolutional layers of pre-trained network to extract activation features.
- Max-pooling in different regions, using multi-scale rigid grid with overlapping cells. Note that ROI pooling is differentiable.
- L2 normalize region features, whiten with PCA and l2-normalize again. PCA projection can be implemented with a shifting and a FC layer.
- Aggregate: sum and l2 normalize.
- Dot product similarity of image vector is approximately many-to-many region matching.

**Region Proposal Network**
- Objective function is multi-task loss, which combines classification loss and regression loss.
- When applied, need to perform non-maximum suppression, keep top K proposals for each image.

**Landmark Dataset Cleaning**
- Construct image graph, with edges as similarity score. The score is computed offline, using invariant keypoint matching and spatial verification.
- Extract connected components in graph. They correspond to differnt profiles of a landmark.

Your comment: