Darel's profile - ShortScience.org

doi.org
sci-hub
scholar.google.com

Latent Relational Metric Learning via Memory-based Attention for Collaborative Ranking
Yi Tay and Luu Anh Tuan and Siu Cheung Hui
Proceedings of the 2018 World Wide Web Conference on World Wide Web - WWW '18 - 2018 via Local CrossRef
Keywords:

[link] Summary by Darel 5 years ago

This work is a direct improvement of Collaborative Metric Learning. While CML tries to retrieve user and item embeddings in a direct way by placing them in metric space and adjusting with triplet loss, this paper focuses on introduction of latent relational vectors.

 A relational vector $r$ must describe relation between user $p$ and item $q$ in a way that  $s(p,q)=\parallel \ p + r - q \parallel \approx 0$.

Vectors $r$ are introduced as a softmax-weighted linear combination of vectors from Latent Relational Attentive Memory (LRAM).

The net is trained with BPR-like loss via negative sampling $max(0, s(p,q) - s(p', q') + margin)$

doi.org
sci-hub
scholar.google.com

Collaborative Metric Learning
Hsieh, Cheng-Kang and Yang, Longqi and Cui, Yin and Lin, Tsung-Yi and Belongie, Serge J. and Estrin, Deborah
ACM WWW - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Darel 5 years ago

## Idea
Use implicit feedback and item features to project users and items into the same latent space to use with kNN later. Learned metric encodes user-item, user-user and item-item relationships.
## Loss
Users and items are represented by vectors $u_i \in \mathcal{R}^r, v_i \in \mathcal{R}^r$.

We define euclidean distance as $d(i,j)= \parallel u_i-v_j\  \parallel$

Loss function consists of 3 parts:
$$\mathcal{L}=\mathcal{L}_m + \lambda_f\mathcal{L}_f + \lambda_c\mathcal{L}_c$$

### Weighted Triplet Loss
Sample user $i$, positive item $j$ and negative item $k$.
$$\mathcal{L}_m=\sum_{i,j,k}w_{ij}[d(i,j)^2-d(i,k)^2+m]_{+}$$
where $[z]_{+}=max(0,z)$ and $m>0$ is the margin size.

$w_{i,j}$ is calculated in WARP fashion, but sampling $U$ negative items for each positive pair $(i,j)$ instead of sampling until imposter is met.
$$w_{i,j}=log(\lfloor |Items| \frac{M}{U}\rfloor + 1)$$
where $M$ is the number of imposters in $U$ sampled negative items.

### Loss for features
Let $x_j \in \mathcal{R}^m$ denote raw feature vector of item $j$. We want it to be close to corresponding item vector $v_j$.
$$\mathcal{L}_f=\sum_j \parallel f(x_j) - v_j\ \parallel ^2$$
where $f$ is some transformation (MLP with dropout) to process item features.

### Regularization
kNN is ineffective ineffective in high-dimensional sparse space, so we bound user and item vectors to a unit sphere.
$$\parallel \ u_* \parallel ^2 \leq 1$$
$$\parallel \ v_* \parallel ^2 \leq 1$$
$L_2$ norm is not used because it pulls every object toward the origin which does not have any specific meaning in our case. Covariance regularization is used instead to de-correlate dimensions of the learned metric.

The covariances between all pairs of dimensions $i$ and $j$ form a matrix $C$.
$$C_{i,j} = \frac{1}{N} \sum_n (y_i^n - \mu_i^n)(y_j^n - \mu_j^n)$$
where $\mu_i = \frac{1}{N}(\parallel C \parallel_f - \parallel diag(C) \parallel_2^2)$ and $\parallel \cdot \parallel_f$ is the Frobenius norm

doi.org
sci-hub
scholar.google.com

Folding: Why Good Models Sometimes Make Spurious Recommendations
Xin, Doris and Mayoraz, Nicolas and Pham, Hubert and Lakshmanan, Karthik and Anderson, John R.
Conference on Recommender Systems - 2017 via Local Bibsonomy
Keywords: dblp

[link] Summary by Darel 5 years ago

One bad item can reduce perceived quality of recommendation list. Sometimes this may be particularly undesirable such as recommending horror movies to children. Authors argue that this happens when missing not at random data is handled improperly and separate groups of users and items overlap during the process of dimensionality reduction and computation of embeddings. Folding is a metric that measures the severity of described effect in a recommendation model.

To calculate folding we must introduce the notion of relatedness between user $i$ and item $j$ which captures the likelihood of interaction between $i$ and $j$, regardless of the rating. In a way this is a form of smoothing the interaction matrix. There are different ways to calculate relatedness, but authors propose to solve matrix factorization task using WALS with high weight for missing interactions or use SVD for this purpose.

Given predicted score and relatedness matrixes $S, R \in \mathbb{R}^{m \times n}$ we can calculate folding as the average across all interactions
$$Folding = \frac{1}{mn} \sum_{i,j}max(0, s_{ij}-r_{ij})$$

doi.org
sci-hub
scholar.google.com

Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations
Frumerman, Shir and Shani, Guy and Shapira, Bracha and Shalom, Oren Sar
ACM UMAP - 2019 via Local Bibsonomy
Keywords: dblp

[link] Summary by Darel 5 years ago

## Idea

When we recommend items to users, some of them are not chosen by the user. These rejected recommendations are usually treated as hard mistakes.

Authors argue that these bad recommendations still may influence user's choice even though they were not picked. For example user didn't click on "Die Hard" but watched another Bruce Willis movie. This seems to be a not so bad recommendation after all and maybe we should not penalize it as hard as we usually do.

Ultimate goal is to invent a metric that would have good correlation between offline results and real online performance.

## User study

Authors held a user study, showing people a set of 5 items: watched movie, 3 rejected recommendations and an item chosen after recommendation. Rejected recommendations were generated into 4 groups: 
- only high content similarity
- only high collaborative similarity
- only high popularity similarity
- all medium similarities

The question was "**How good is this recommendation 1-5?**"

| Content | Collaborative | Popularity | Other |
| ------- | ------------- | ---------- | ----- |
| 3.8     | 3.52          | 2.93       | 1.99  |

## Proposal
If standard precision is
$$p_u = \frac{|c_u \cap r_u|}{|r_u|}$$
where $c_u$ are items chosen by user and $r_u$ items recommended to user, then we can define a refined precision as 
$$p_u^{sim} = p_u + \frac{\sum_{i \in r_u \setminus c_u}max_{j \in \{ c_u:\ t(u,j) > t(u,i)\}}sim(i, j)}{|r_u|}$$
where $t(u,i)$ is the time when user $u$ interacted with item $i$.

## Evaluation
Authors used Xing dataset containing user interactions with a system for seeking employment opportunities. It contains logs of what was recommended and what was clicked.

### "Online" evaluation
Measure correlation between different refinement types of precision of RS presented in dataset and actual user clicks.

| Content | Collaborative | Regular |
| ------- | ------------- | ------- |
| 0.615   | 0.197         | 0.184   |

### Offline evaluation
Split logs 70/30 by time and measure correlation between number of clicks per user on test and metrics on train as if we were training a model on train part.



| Train clicks | Content | Collaborative | Random |
| ------------ | ------- | ------------- | ------ |
| 0.5          | 0.35    | 0.16          | 0.087  |


## Open question
What is the best way to calculate item similarity?

Darel

sciscore: 2.25