What is a context window?

Length:

3 min

Published:

June 9, 2026

What is a context window?

A context window is the amount of text an AI model can consider at one time. It covers everything in play: your prompt, any documents you paste in, the earlier turns of the conversation, and the answer the model is writing. The size is measured in tokens, where a token is roughly three-quarters of a word in English. A model with a 200,000-token window can hold a few hundred pages at once.

Whatever does not fit in the window is, in effect, invisible to the model. It cannot reason about text it cannot see. This is why a long chat eventually starts to "forget" how it began: the earliest messages have scrolled out of the window to make room for new ones.

In plain words

Imagine working at a desk that fits only a fixed number of papers. You can read and connect anything on the desk right now. But when you bring in a new document and the desk is full, an old one falls off the edge. The context window is the size of that desk. The model is sharp on what is in front of it and blind to whatever has fallen off.

Why it matters

It sets how much the model can work with at once. A bigger window lets you feed in a whole contract, a long codebase, or a full conversation and get answers that account for all of it.
It explains "forgetting" in long chats. When a conversation runs past the limit, the model loses the start. Restating key points keeps them in view.
It drives cost and speed. You usually pay per token, and more tokens mean slower replies. A huge window is not free to fill.

Common pitfalls

Bigger is not always better. Stuffing the window with everything can bury the important part. Models can lose track of details in the middle of very long inputs.
The window is not memory. Once a conversation ends, the model keeps nothing. The next session starts blank unless you feed the history back in.
Tokens are not words. Limits are counted in tokens, not words or characters, so a "200K window" holds fewer than 200,000 words. Budget accordingly.

What is an LLM? - The kind of model whose context window you are filling with each prompt.
What is a prompt? - Everything you write counts against the window.
What is Retrieval-Augmented Generation? - Feeding only the relevant text into a limited window.

Back to insights

Want to stay one step ahead?

Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.