To Aggregate or Not to aggregate: Selective Match Kernels for Image Search
Tolias, Giorgos
and
Avrithis, Yannis S.
and
Jégou, Hervé
International Conference on Computer Vision - 2013 via Local Bibsonomy
Keywords:
dblp
Descriptors and matching kernel are key components in an image search system. This paper present a framework for matching kernels including non-aggregated kernel such as Hamming Embedding (HE) and aggregated kernel such as Bag-of-Words (BoW) and vector or locally aggregated descriptors (VLAD). To evaluate the effectiveness of aggregation, this paper introduces selective match kernel (SMK) (non-aggregated) and aggregated selective match kernel (ASMK) based on the framework. Experimental results show that ASMK outperforms SMK amd state-of-the-art methods because ASMK can deal with burstiness better than SMK.
Technical details
The frame work of matching kernel is described by the following general form.
$$K(\mathcal{X},\mathcal{Y}) = \gamma(\mathcal{X})\gamma(\mathcal{Y})
\displaystyle\sum\_{c \in C} w\_c M (\mathcal{X}\_c,\mathcal{Y}\_c)$$
where X and Y are the descriptors of two images, Xc and Yc are a subset of the descriptors that are assigned to a particular visual word, M denotes similarity function, wc is a scalar and gamma denotes normalization factor.
The proposed selective match kernel (SMK) is denoted by
$$M\_N(\mathcal{X}\_c,\mathcal{Y}\_c) =
\displaystyle\sum\_{x \in \mathcal{X}\_c}
\displaystyle\sum\_{y \in \mathcal{Y}\_c}
\sigma (\phi(x)^T\phi(y))$$
Note that #$\mathcal{X}\_c$ times #$\mathcal{Y}\_c$ (# = number of ) matches (dot product) are needed for each visual word.
The proposed aggregated selective match kernel (ASMK) is denoted by

Note that only one match (dot product) is needed for each visual word.
Results
As shown in Figure 5, ASMK outperform SMK and SMK-BURST. BURST refer to burstiness normalization.

Table 4 shows that ASMK outperforms state-of-the-art methods.

Note that all the results above are from the initial result set. Re-ranking approaches are not included.