Small Language Model (SLM)

In 2023, a Microsoft researcher named Ronen Eldan asked a question that sounds almost too simple: what if you trained an AI model using only words a four-year-old could understand?

He wasn’t being cute. He was testing a hypothesis that challenged everything the AI industry believed about building smarter systems. At the time, the prevailing wisdom was brutally clear: bigger models, trained on more data, produce better results. GPT-4 reportedly uses over a trillion parameters. Google’s Gemini Ultra, roughly the same. The industry was in a scaling arms race, and the only direction was up.

Eldan went the other way. He built a dataset called TinyStories using just 3,000 simple words, then trained a model with only 10 million parameters on it. The model could write coherent short stories. Not great literature, but readable, structured narratives from a system small enough to fit on a thumb drive. That experiment sparked a research program at Microsoft that would produce the Phi family of models and help launch an entirely new category in AI: the small language model.

What It Actually Means

A small language model is built on the same transformer architecture as the large language models you’ve probably already encountered, like GPT-4 or Claude. It understands and generates text using the same fundamental approach: read a prompt, predict the most likely next word, repeat. The difference is scale. Where large models have hundreds of billions (or trillions) of parameters, small models typically range from one billion to about fourteen billion.

If a large language model is a research university library with millions of volumes on every conceivable subject, a small language model is a well-curated personal bookshelf. It won’t have an answer for everything, but what it does know, it knows well. And you don’t need a building to house it.

That last part is the point. SLMs are compact enough to run on your laptop, your tablet, or even your phone, with no internet connection required.

The Textbook Revolution

The story of how small models got good is really a story about training data.

For years, AI labs fed their models the internet, more or less all of it. The assumption was that more data meant better performance, even if a lot of that data was Reddit arguments, spam, and SEO filler. Eldan’s TinyStories experiment suggested something different: maybe quality matters more than quantity.

His colleagues at Microsoft Research, led by Sebastien Bubeck, ran with the idea. In June 2023, they published a paper with a title that read like a thesis statement: “Textbooks Are All You Need.” Their model, Phi-1, had just 1.3 billion parameters and was trained for four days on eight GPUs. Instead of scraping the web indiscriminately, they curated a small dataset of textbook-quality material: clear explanations, worked examples, structured reasoning. The results were startling. Phi-1 matched models five times its size on coding benchmarks.

By December 2023, the team’s next model, Phi-2 (2.7 billion parameters), was outperforming Meta’s Llama-2-70B on multi-step reasoning tasks. A model beating one 25 times its size. The AI equivalent of a paperback outselling an encyclopedia.

Other labs took notice quickly. Mistral AI, a French startup, released Mistral 7B in September 2023 under an open-source license, and it embarrassed models twice its size across nearly every benchmark. Google followed with its Gemma family. Meta released compact versions of Llama. And in June 2024, Apple made the biggest statement of all: it built Apple Intelligence, the AI system powering every new iPhone and Mac, on a three-billion-parameter small language model. When the world’s most valuable company bets its on-device AI strategy on a small model rather than a large one, the category has officially arrived.

How They Punch Above Their Weight

Three techniques make this possible.

Curated training data is the big one. Instead of learning from the chaotic internet at large, SLMs learn from carefully selected, high-quality material. Think of it as the difference between learning to write by reading a thousand great novels versus reading a million random blog posts. Both give you exposure to language, but one makes you a dramatically better writer.

Knowledge distillation lets a small model learn from a large one. A powerful “teacher” model (like GPT-4) generates training examples, and the smaller “student” model learns to reproduce not just the right answers but the reasoning patterns behind them. The student absorbs the teacher’s instincts without needing the teacher’s size.

Quantization reduces the numerical precision of a model’s internal math. Standard models use 32-bit numbers for their calculations. Quantized models use 8-bit or even 4-bit numbers, dramatically shrinking their memory footprint. Apple’s on-device model averages 3.7 bits per weight and still runs at 30 words per second on an iPhone. It’s like compressing a high-resolution photo to fit in a text message while keeping it sharp enough that nobody notices the difference.

Why Authors Should Care

The practical value comes down to four things: privacy, cost, speed, and access.

Privacy. When you brainstorm with ChatGPT or Claude, your words travel to a remote server. For many authors, that’s perfectly fine. But if you’re working on an unpublished manuscript, a sensitive memoir, or client work under NDA, sending your text to the cloud can feel uncomfortable. A small model running on your laptop keeps everything local. Your drafts never leave your machine.

Cost. Most SLMs are free to use. Tools like Ollama, LM Studio, and Jan.ai let you download and run open-source models without paying for API access or a monthly subscription. For authors who use AI frequently, the savings over cloud-based tools add up.

Speed. Small models respond in milliseconds, not seconds. When you’re deep in a writing flow and need a quick brainstorm or a reworded sentence, that near-instant response keeps your momentum intact instead of breaking it.

Offline access. No Wi-Fi at the cabin? Unreliable connection on a train? A local model doesn’t care. It runs entirely on your hardware, which means it works wherever you do.

Apple Intelligence is already putting a small language model in the hands of millions of authors who may not realize it. Every time you use Writing Tools on your iPhone to proofread a paragraph or rewrite an email in a different tone, a small language model is doing the work, right on your device, without sending a single word to the cloud.

SLMs aren’t a replacement for the big models. They can’t match a GPT-4 or Claude when it comes to complex, multi-step reasoning or deeply nuanced creative direction. But for quick writing tasks, brainstorming, summarization, and first-draft assistance, they’re remarkably capable, and they’re getting better with every new release. A study accepted at the 2025 COLING conference even found that a fine-tuned small model produced more surprising creative associations than GPT-4o (15% of its story ideas contained unexpected connections, compared to just 3% from the larger model). For brainstorming, a little unpredictability can be a feature, not a bug.

The AI tools you rely on are no longer only in the cloud. They’re on your desk, in your pocket, and many of them are free. Understanding what small language models can do puts you in a better position to choose the right tool for each part of your writing process.