Wide & Deep Learning for Recommender Systems
Heng-Tze Cheng
and
Levent Koc
and
Jeremiah Harmsen
and
Tal Shaked
and
Tushar Chandra
and
Hrishi Aradhye
and
Glen Anderson
and
Greg Corrado
and
Wei Chai
and
Mustafa Ispir
and
Rohan Anil
and
Zakaria Haque
and
Lichan Hong
and
Vihan Jain
and
Xiaobing Liu
and
Hemal Shah
arXiv e-Print archive - 2016 via Local arXiv
Keywords:
cs.LG, cs.IR, stat.ML
First published: 2016/06/24 (8 years ago) Abstract: Generalized linear models with nonlinear feature transformations are widely
used for large-scale regression and classification problems with sparse inputs.
Memorization of feature interactions through a wide set of cross-product
feature transformations are effective and interpretable, while generalization
requires more feature engineering effort. With less feature engineering, deep
neural networks can generalize better to unseen feature combinations through
low-dimensional dense embeddings learned for the sparse features. However, deep
neural networks with embeddings can over-generalize and recommend less relevant
items when the user-item interactions are sparse and high-rank. In this paper,
we present Wide & Deep learning---jointly trained wide linear models and deep
neural networks---to combine the benefits of memorization and generalization
for recommender systems. We productionized and evaluated the system on Google
Play, a commercial mobile app store with over one billion active users and over
one million apps. Online experiment results show that Wide & Deep significantly
increased app acquisitions compared with wide-only and deep-only models. We
have also open-sourced our implementation in TensorFlow.
TLDR; The authors jointly train a Logistic Regression Model with sparse features that is good at "memorization" and a deep feedforward net with embedded sparse features that is good at "generalization". The model is live in the Google Play store and has achieved a 3.9% gain in app acquisiton as measured by A/B testing.
#### Key Points
- Wide Model (Logistic Regression) gets cross product of binary features, e.g. "AND(user_installed_app=netflix, impression_app=pandora") as inputs. Good at memorization.
- Deep Model alone has a hard time to learning embedding for cross-product features because no data for most combinations but still makes predictions.
- Trained jointly on 500B examples.