
Did you know that Netflix has a huge team of researchers and that of what you watch on Netflix is influenced by their system for recommending titles? Have you ever wondered how the recommendation system works?
The recommendations you see are the result of powerful recommendation models. Originally, each section - e.g. "Continue playing" and "Next time don't miss" had its own model that took data from the same sources as the others but each model was trained separately. Maintaining and improving the individual models was becoming increasingly difficult and expensive.
This year, Netflix is starting to move towards a unified and comprehensive system - building a powerful foundation model that understands user behavior and preferences and can share that data across all recommendation systems.
From many models one supermodel 👀
Originally, Netflix had a bunch of smaller models, each trained separately. For example, one remembered what you liked in action movies, another recommended shows that were popular. But the models didn't communicate with each other. This caused problems especially during updates and when the models needed to be upgraded.
Netflix's new approach is inspired by the workings of (= large language models, LLMs for short). Instead of building lots of small models, they now build one big one that understands your tracking habits as a whole. This model can then help other systems by sharing what it has learned - either directly or through reusable .
Tokenization = Converting tracking habits into tokens
Netflix is a professional stalker. It watches your every interaction: what you watch, for how long, what you skip, even on what device and in what language. But raw (unlabeled) data alone is not enough. Netflix therefore converts these (inter)actions into (tokens) - units of behavior, such as "Watched Stranger Things for 40 minutes on my phone tonight".
The model is fed these tokens to learn how users behave over time. This is where the next challenge comes in - users do a lot of things. So Netflix has to find a way to decide how much detail to retain, but at the same time, it has to make sure that the data is processed quickly.
A model learns as a person, not just as a machine
As we mentioned, Netflix took inspiration from LLMs that predict the next word or token. But Netflix wants to predict the next action a user might take. But there are lots of actions, so they have to give them different weights - watching a full movie, for example, has more weight and meaning than watching a trailer that's 3 minutes long. So the model learns to perceive what is important and this allows it to better recommend shows that you might like.