Wide & Deep Learning for Recommender Systems
arXiv e-Print archive - 2016 via Local arXiv
cs.LG, cs.IR, stat.ML
First published: 2016/06/24 (6 years ago) Abstract: Generalized linear models with nonlinear feature transformations are widely
used for large-scale regression and classification problems with sparse inputs.
Memorization of feature interactions through a wide set of cross-product
feature transformations are effective and interpretable, while generalization
requires more feature engineering effort. With less feature engineering, deep
neural networks can generalize better to unseen feature combinations through
low-dimensional dense embeddings learned for the sparse features. However, deep
neural networks with embeddings can over-generalize and recommend less relevant
items when the user-item interactions are sparse and high-rank. In this paper,
we present Wide & Deep learning---jointly trained wide linear models and deep
neural networks---to combine the benefits of memorization and generalization
for recommender systems. We productionized and evaluated the system on Google
Play, a commercial mobile app store with over one billion active users and over
one million apps. Online experiment results show that Wide & Deep significantly
increased app acquisitions compared with wide-only and deep-only models. We
have also open-sourced our implementation in TensorFlow.
TLDR; The authors jointly train a Logistic Regression Model with sparse features that is good at "memorization" and a deep feedforward net with embedded sparse features that is good at "generalization". The model is live in the Google Play store and has achieved a 3.9% gain in app acquisiton as measured by A/B testing.
#### Key Points
- Wide Model (Logistic Regression) gets cross product of binary features, e.g. "AND(user_installed_app=netflix, impression_app=pandora") as inputs. Good at memorization.
- Deep Model alone has a hard time to learning embedding for cross-product features because no data for most combinations but still makes predictions.
- Trained jointly on 500B examples.