Context Window

You’re three thousand words into a conversation with ChatGPT about your novel. You’ve described your protagonist in detail, established the 1940s New Orleans setting, and explained the complicated subplot about the missing sister. You ask the AI to write the next scene, and it calls your protagonist by the wrong name. Or invents a brother who never existed. Or sets the whole thing in modern-day Chicago.

The AI didn’t glitch. It ran out of room.

What a Context Window Actually Is

A context window is the total amount of text an AI model can hold in its head at one time. Think of it as the model’s working memory, or its desk. Everything within the window (your instructions, the conversation so far, the model’s own responses) is right there, visible, available. Everything outside the window might as well not exist.

The size of a context window is measured in tokens, which are roughly three-quarters of a word each. When a model’s context window fills up, the oldest content gets quietly pushed out, like papers sliding off the far edge of a crowded desk. The AI doesn’t warn you. It doesn’t summarize what it lost. It just proceeds as if that earlier text was never part of the conversation.

A Term with Deeper Roots Than You’d Expect

“Context window” didn’t originate with ChatGPT, or even with the transformer architecture that powers it. The phrase has been around in computer science for decades, borrowed from a simple idea in signal processing: if you’re analyzing a stream of data, focus on a fixed-size slice at a time and slide that slice forward as you go. Linguists adapted the concept for language. When early statistical models tried to predict the next word in a sentence, they’d look at a small window of surrounding words (two, three, maybe five) and use those as context clues.

The term crystallized in its modern NLP form around 2013, when Tomas Mikolov and his colleagues at Google published their influential Word2Vec papers. In Word2Vec, the context window was an explicit setting you could adjust: how many words on either side of a target word should the model consider when learning what that word means? A window of two around the word “loves” in “the man loves his son” would capture four surrounding words. Small windows caught tight grammatical relationships. Larger ones caught broader thematic connections.

When the transformer architecture arrived in 2017, the meaning of “context window” scaled up dramatically. Instead of a handful of neighboring words used during training, it became the entire span of text the model could process at once during a conversation. Same metaphor, radically different scope. A Word2Vec context window might hold ten words. A modern large language model can hold the equivalent of several novels.

From a Peephole to a Picture Window

The growth of context windows over the past few years has been staggering.

The original transformer in 2017 could handle 512 tokens, roughly a page of text. GPT-2 doubled that to 1,024. GPT-3, which launched the modern AI era, managed 2,048 tokens, enough for about three pages. When ChatGPT debuted in late 2022, it worked with 4,096 tokens, still just a handful of pages.

Then the race began in earnest. GPT-4 arrived in early 2023 with 8,192 tokens (and a 32,000-token variant). A few months later, Anthropic’s Claude 2 leapt to 100,000 tokens, enough to hold an entire novel. By 2024, Google’s Gemini 1.5 Pro reached two million tokens. Today, some models claim context windows of ten million tokens or more.

This expansion wasn’t easy. The transformer’s self-attention mechanism, the very thing that makes it so good at understanding language, has an expensive habit: every token checks in with every other token. Double the context length and you quadruple the computation required. Engineers who expanded context windows from thousands to millions of tokens had to invent clever workarounds (sparse attention patterns, more efficient memory management, new ways of encoding position) to keep the math from becoming impossibly expensive.

But bigger isn’t always better. Researchers have uncovered a quirk called the “lost in the middle” problem: models retrieve information most reliably from the beginning and end of their context window, with accuracy dipping for content stuck in the middle. A model with a 200,000-token window may start getting fuzzy well before it reaches capacity. The window is real, but the view isn’t equally sharp across its entire span.

Why This Matters for Your Writing Life

The context window is the single most important constraint shaping how you work with AI writing tools, and understanding it immediately makes you better at using them.

It determines how much of your manuscript the AI can “see.” A typical 80,000-word novel is roughly 107,000 tokens. With Claude’s 200,000-token context window, the entire manuscript fits with room to spare, meaning the AI can check for consistency, track character arcs, and suggest edits with your whole book in view. A model with a 4,000-token window can barely hold a single scene.

It explains why long conversations go sideways. When you and an AI have been chatting for a while, the oldest messages eventually scroll out of the context window without warning. The model doesn’t know it’s forgotten anything. It just starts generating responses based on an incomplete picture. If your AI collaborator suddenly contradicts itself or loses the thread, this is almost always why. Starting a fresh conversation and re-providing the essential context usually fixes it.

It’s why specialized writing tools exist. Apps like NovelCrafter don’t just dump your entire manuscript into the context window and hope for the best. NovelCrafter uses a structured database of characters, locations, and plot points (called the Codex), selectively injecting only the most relevant details for each scene. This is smart engineering built specifically around context window limitations, and it’s a big part of why purpose-built writing tools often outperform a raw chatbot for novel-length work.

It shapes how you should structure your prompts. Because of the “lost in the middle” effect, put your most important instructions at the very beginning of a long prompt, or repeat them near the end. Style guides, character voice notes, crucial plot constraints: front-load them. The AI will pay the most attention to what it sees first and last.

The context window is expanding fast, and a future where an AI can hold your entire series in its head at once isn’t far off. But for now, knowing the size of the room you’re working in, and arranging your furniture accordingly, is one of the most practical things you can learn about these tools.