Embeddings in Recommendation Systems
Mapping users, items, and context into one vector space
The most fundamental question in RecSys is "will this user like this item?" Embeddings turn this into a distance problem in vector space.
If user and item vectors are close, the user likely prefers it. This is the core of Two-Tower models, and Matrix Factorization is essentially the same idea.
Types of embeddings
ID embeddings: Learnable vectors for user IDs and item IDs. The most basic but weak against cold start.
Feature embeddings: Vectorize attributes like category, tags, price range. The Deep part of Wide&Deep does this.
Sequence embeddings: Encode entire behavior sequences into one vector. GRU4Rec, BERT4Rec fall here.
Context embeddings: Situational info like time, location, device. The same user wants different things on a commute vs. a weekend.
How It Works
Define features for User/Item/Context
Vectorize each feature via Embedding Layer
Combine vectors (concat/attention) into a unified representation
Compute matching score via dot product or cosine similarity
Pros
- ✓ Unifies heterogeneous data (text, image, behavior) in one space
- ✓ Millisecond serving via ANN index
Cons
- ✗ Requires hyperparameter tuning (dimensions, learning rate)
- ✗ Embedding drift over time β periodic retraining required