Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations on ShortScience.org

doi.org
sci-hub
scholar.google.com

Are All Rejected Recommendations Equally Bad?: Towards Analysing Rejected Recommendations
Frumerman, Shir and Shani, Guy and Shapira, Bracha and Shalom, Oren Sar
ACM UMAP - 2019 via Local Bibsonomy
Keywords: dblp

Summaries/Notes 1

[link] Summary by Darel 5 years ago

## Idea

When we recommend items to users, some of them are not chosen by the user. These rejected recommendations are usually treated as hard mistakes.

Authors argue that these bad recommendations still may influence user's choice even though they were not picked. For example user didn't click on "Die Hard" but watched another Bruce Willis movie. This seems to be a not so bad recommendation after all and maybe we should not penalize it as hard as we usually do.

Ultimate goal is to invent a metric that would have good correlation between offline results and real online performance.

## User study

Authors held a user study, showing people a set of 5 items: watched movie, 3 rejected recommendations and an item chosen after recommendation. Rejected recommendations were generated into 4 groups: 
- only high content similarity
- only high collaborative similarity
- only high popularity similarity
- all medium similarities

The question was "**How good is this recommendation 1-5?**"

| Content | Collaborative | Popularity | Other |
| ------- | ------------- | ---------- | ----- |
| 3.8     | 3.52          | 2.93       | 1.99  |

## Proposal
If standard precision is
$$p_u = \frac{|c_u \cap r_u|}{|r_u|}$$
where $c_u$ are items chosen by user and $r_u$ items recommended to user, then we can define a refined precision as 
$$p_u^{sim} = p_u + \frac{\sum_{i \in r_u \setminus c_u}max_{j \in \{ c_u:\ t(u,j) > t(u,i)\}}sim(i, j)}{|r_u|}$$
where $t(u,i)$ is the time when user $u$ interacted with item $i$.

## Evaluation
Authors used Xing dataset containing user interactions with a system for seeking employment opportunities. It contains logs of what was recommended and what was clicked.

### "Online" evaluation
Measure correlation between different refinement types of precision of RS presented in dataset and actual user clicks.

| Content | Collaborative | Regular |
| ------- | ------------- | ------- |
| 0.615   | 0.197         | 0.184   |

### Offline evaluation
Split logs 70/30 by time and measure correlation between number of clicks per user on test and metrics on train as if we were training a model on train part.



| Train clicks | Content | Collaborative | Random |
| ------------ | ------- | ------------- | ------ |
| 0.5          | 0.35    | 0.16          | 0.087  |


## Open question
What is the best way to calculate item similarity?

Your comment:

Write your summary here (You can use $\LaTeX$ and markdown syntax):

Anon Private