Collaborative Metric Learning
Hsieh, Cheng-Kang
and
Yang, Longqi
and
Cui, Yin
and
Lin, Tsung-Yi
and
Belongie, Serge J.
and
Estrin, Deborah
ACM WWW - 2017 via Local Bibsonomy
Keywords:
dblp
## Idea
Use implicit feedback and item features to project users and items into the same latent space to use with kNN later. Learned metric encodes user-item, user-user and item-item relationships.
## Loss
Users and items are represented by vectors $u_i \in \mathcal{R}^r, v_i \in \mathcal{R}^r$.
We define euclidean distance as $d(i,j)= \parallel u_i-v_j\ \parallel$
Loss function consists of 3 parts:
$$\mathcal{L}=\mathcal{L}_m + \lambda_f\mathcal{L}_f + \lambda_c\mathcal{L}_c$$
### Weighted Triplet Loss
Sample user $i$, positive item $j$ and negative item $k$.
$$\mathcal{L}_m=\sum_{i,j,k}w_{ij}[d(i,j)^2-d(i,k)^2+m]_{+}$$
where $[z]_{+}=max(0,z)$ and $m>0$ is the margin size.
$w_{i,j}$ is calculated in WARP fashion, but sampling $U$ negative items for each positive pair $(i,j)$ instead of sampling until imposter is met.
$$w_{i,j}=log(\lfloor |Items| \frac{M}{U}\rfloor + 1)$$
where $M$ is the number of imposters in $U$ sampled negative items.
### Loss for features
Let $x_j \in \mathcal{R}^m$ denote raw feature vector of item $j$. We want it to be close to corresponding item vector $v_j$.
$$\mathcal{L}_f=\sum_j \parallel f(x_j) - v_j\ \parallel ^2$$
where $f$ is some transformation (MLP with dropout) to process item features.
### Regularization
kNN is ineffective ineffective in high-dimensional sparse space, so we bound user and item vectors to a unit sphere.
$$\parallel \ u_* \parallel ^2 \leq 1$$
$$\parallel \ v_* \parallel ^2 \leq 1$$
$L_2$ norm is not used because it pulls every object toward the origin which does not have any specific meaning in our case. Covariance regularization is used instead to de-correlate dimensions of the learned metric.
The covariances between all pairs of dimensions $i$ and $j$ form a matrix $C$.
$$C_{i,j} = \frac{1}{N} \sum_n (y_i^n - \mu_i^n)(y_j^n - \mu_j^n)$$
where $\mu_i = \frac{1}{N}(\parallel C \parallel_f - \parallel diag(C) \parallel_2^2)$ and $\parallel \cdot \parallel_f$ is the Frobenius norm