How Language Models Work in Plain English: Understanding AI's Brain

Michael Brenndoerfer

AI Agent Handbook Machine Learning Data, Analytics & AI

Learn how language models predict text, process tokens, and power AI agents through simple analogies and clear explanations. Understand training, parameters, and why context matters for building intelligent agents.

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

How Language Models Work (in Plain English)

If you've ever used autocomplete on your phone, you've experienced a tiny glimpse of what language models do. You start typing "I'm going to the..." and your phone suggests "store," "park," or "movies." It's predicting what word comes next based on what you've typed so far.

Language models (the AI brain powering our agent) work on this same principle, just at a vastly more sophisticated scale. They're trained to predict the next word (or more precisely, the next "token") in a sequence of text. That might sound simple, but this one ability turns out to be remarkably powerful. It's what allows our AI agent to understand questions, generate coherent answers, and even appear to reason about problems.

When you ask ChatGPT a question or tell Claude to write an email, you're really just giving these models a starting point. They then predict what should come next, token by token, drawing on patterns learned from billions of pages of text. The magic isn't in some hidden understanding of meaning or consciousness. It's in the sheer scale and sophistication of pattern matching.

In this chapter, we'll demystify how these models work. No advanced math required: just some intuition about patterns, predictions, and a whole lot of text.

The Autocomplete Analogy

Let's start with something familiar. Imagine you're writing an email and you type:

The cat sat on the

What word comes next? You'd probably guess "mat," "floor," "chair," or something similar. You definitely wouldn't guess "elephant" or "democracy." Why? Because you've read thousands of sentences in your life, and you've learned patterns about how English works and what makes sense in context.

Language models learn the same way, except they've "read" billions of pages of text from books, websites, articles, and more. During training, they see countless examples of sentences and learn to predict what comes next. Over time, they build up an incredibly rich understanding of:

Grammar and syntax: How words fit together ("the cat sat" not "sat cat the")
Common phrases: Idioms, expressions, and typical word combinations
Factual knowledge: Paris is in France, water boils at 100°C, Shakespeare wrote plays
Context: How earlier words influence what should come later

The key insight is this: by learning to predict the next word really well, the model implicitly learns about language, facts, reasoning patterns, and more. It's like how learning to play chess well requires understanding strategy, tactics, and planning, even though the basic rules are simple.

Breaking Text into Tokens

Before we go further, let's clarify what models actually work with. When you type a sentence, the model doesn't see individual letters. Instead, it breaks text into tokens (chunks that might be whole words, parts of words, or even punctuation).

For example, the sentence "The AI agent is helpful!" might be split into tokens like this:

["The", " AI", " agent", " is", " helpful", "!"]

Notice that spaces are often attached to words, and common words usually get their own token. Rarer or longer words might be split into pieces. For instance, "unhelpful" might become three separate tokens: "un", "help", and "ful".

Why use tokens instead of letters or whole words? It's a practical compromise that balances efficiency with flexibility. If models worked with individual letters, they'd need to learn spelling from scratch (a huge waste of computational power). On the other hand, using whole words would create a massive vocabulary with millions of entries, including countless rare words that appear only once or twice. Tokens strike a sweet spot: a vocabulary of around 50,000 to 100,000 tokens can represent virtually any text efficiently, breaking rare words into familiar pieces while keeping common words intact.

You don't need to worry about tokenization most of the time. Just know that when we say the model "predicts the next word," it's really predicting the next token. For simplicity, we'll keep saying "word" in this chapter, but technically it's "token."

Training: Learning from Examples

How does a language model learn to predict the next word? Through a process called training, which happens long before you ever interact with the model.

Here's the basic idea:

Step 1: Gather massive amounts of text

The model's creators collect a huge dataset: books, articles, websites, code repositories, you name it. This is the model's "reading material." The more diverse and high-quality the text, the better the model can learn.

Step 2: Show the model partial sentences

During training, the model sees a sentence with some words hidden. For example:

Input: "The capital of France is"
Hidden answer: "Paris"

The model tries to predict what comes next. At first, it's basically guessing randomly.

Step 3: Correct the model's mistakes

After the model makes a prediction, the training process compares it to the actual next word. If the model guessed "London" instead of "Paris," it gets corrected. The model's internal parameters (think of these as millions of tiny knobs) get adjusted slightly to make "Paris" more likely next time in similar contexts.

Step 4: Repeat billions of times

This process repeats over and over with different examples (millions or billions of them). Gradually, the model's predictions get better. It learns that "The capital of France is" should be followed by "Paris," that "Once upon a" is often followed by "time," and that "2 + 2 =" should be followed by "4."

After training on enough text, the model develops a kind of statistical intuition about language. It hasn't memorized every sentence (there are too many), but it's learned the patterns.

Here's what makes this remarkable: the model isn't just learning surface-level word associations. Because it sees the same concepts expressed in countless different ways, it builds up a rich, interconnected understanding. It learns that "Paris," "the French capital," and "the city where the Eiffel Tower stands" all refer to the same place. It learns that questions ending with "?" usually expect answers, that code snippets follow specific syntax rules, and that formal writing differs from casual conversation. All of this emerges from the simple task of predicting the next token, repeated billions of times across diverse text.

How the Model Makes Predictions

Once trained, here's what happens when you give the model some text:

You provide a prompt (the starting text):

"Explain why the sky is blue in simple terms:"

The model processes it:

Internally, the model converts your prompt into tokens and runs them through its neural network (a complex mathematical structure with billions of parameters). Think of it as a very sophisticated pattern-matching machine.

The model predicts the next token:

Based on everything it learned during training, the model calculates probabilities for what token should come next. It might determine:

"The" $\to$ 35% likely
"It's" $\to$ 20% likely
"Because" $\to$ 15% likely
"Sky" $\to$ 10% likely
... and thousands of other possibilities with lower probabilities

The model picks a token:

Usually, it selects a high-probability token (though not always the absolute highest; a bit of randomness keeps responses interesting). Let's say it picks "The."

The model repeats:

Now the prompt is "Explain why the sky is blue in simple terms: The" and the model predicts the next token again. Maybe it picks "sky." Then "appears." Then "blue." And so on, building up a complete response one token at a time:

"The sky appears blue because of a process called Rayleigh scattering..."

The model keeps generating tokens until it decides it's done (often by predicting a special "end" token) or hits a length limit.

What the Model Knows (and Doesn't Know)

This prediction mechanism gives language models some impressive capabilities, but also important limitations.

What models are good at:

Language fluency: They generate grammatically correct, natural-sounding text because they've seen millions of examples of good writing
Factual recall: They can answer many factual questions because facts appeared repeatedly in their training data ("Paris is the capital of France" shows up in countless documents)
Pattern recognition: They can complete common patterns, solve typical problems, and follow familiar formats
Context understanding: They track what's been said earlier in a conversation and maintain coherence

What models struggle with:

Exact facts: If something appeared rarely in training data, the model might not know it or might confuse it with similar information
Current events: Models are trained on data up to a certain cutoff date. They don't know what happened after that (unless we give them tools to look things up; more on that in Chapter 5)
Precise calculations: While they can do simple math, complex calculations aren't their strength because they're predicting tokens, not actually computing
True reasoning: They can follow reasoning patterns they've seen before, but they don't "think" in the way humans do. They're very sophisticated pattern matchers

This is why our AI agent will eventually need more than just a language model. We'll add tools for calculations, memory for context, and structured reasoning approaches to overcome these limitations. But the language model is the foundation—it's what lets our agent understand language and generate responses.

Different Sizes, Different Capabilities

Not all language models are created equal. They come in different sizes, usually measured by the number of parameters (those internal knobs we mentioned earlier).

Smaller models, with millions to a few billion parameters, are faster and cheaper to run. They work well for simple tasks like classification or generating short responses, but they may struggle with complex reasoning or rare knowledge. Think of them as quick-thinking but less experienced assistants.

Larger models, with tens to hundreds of billions of parameters, are slower and more expensive to run, but they excel at complex tasks, nuanced understanding, and following intricate instructions. They're more likely to have learned rare facts or specialized knowledge because they've had the capacity to absorb more patterns during training. These are your deep-thinking, well-read experts.

This size-capability relationship isn't arbitrary. More parameters mean more capacity to encode nuanced patterns. A small model might learn that "bank" relates to "money," but a larger model can distinguish between financial banks, river banks, and even the verb "to bank on something" based on subtle context clues. This additional capacity comes at a cost, though. A model with 100 billion parameters might take 10 times longer to generate a response than one with 10 billion parameters, and it requires significantly more memory and computational resources.

For our personal assistant agent, we'll likely use a large, capable model (like GPT-4, Claude, or Gemini) because we want it to handle a wide variety of tasks well. But knowing that smaller models exist is useful. Sometimes you can use a small model for simple subtasks to save time and cost, a strategy we'll explore in Chapter 15. This approach, called "model routing," lets you match the model size to the task complexity, getting the best balance of quality and efficiency.

A Peek Under the Hood (Optional)

If you're curious about what's really happening inside, here's a slightly deeper look (but feel free to skip this if you're satisfied with the autocomplete analogy).

Language models are built on neural networks, specifically a type called transformers. These networks consist of layers of mathematical operations that process the input tokens. The key innovation is something called attention, a mechanism that lets the model focus on relevant parts of the input when predicting the next token.

For example, when predicting what comes after "The cat sat on the mat and then it," the attention mechanism helps the model realize that "it" probably refers back to "cat," not "mat." This ability to connect distant parts of text is what makes modern language models so powerful.

The model's billions of parameters encode all the patterns it learned during training. When you give it a prompt, these parameters work together to calculate probabilities for the next token, drawing on everything the model has learned.

You don't need to understand the math to use language models effectively, just like you don't need to understand engine mechanics to drive a car. But knowing that there's a sophisticated mechanism underneath can help you appreciate both the capabilities and the limitations.

Why This Matters for Our Agent

Understanding how language models work helps us build better agents. Here are some key takeaways that will guide our journey ahead.

The model is a predictor, not a database. It doesn't look up facts in a stored table; it predicts what text is likely based on patterns. This means it can sometimes "hallucinate," confidently generating plausible-sounding but incorrect information. We'll need to add fact-checking mechanisms and external tools to our agent to ground it in reality.

Context is crucial. The model only knows what's in the prompt you give it. If you want it to remember earlier conversations, you'll need to include that history in each prompt. We'll cover this in Chapter 6 on memory, where you'll learn how to give your agent a working memory that spans multiple interactions.

Garbage in, garbage out. The quality of the model's output depends heavily on the quality of your prompt. Clear, specific instructions lead to better results, while vague or confusing prompts produce vague or confusing answers. Chapter 3 will teach you how to craft effective prompts that get the best from your model.

It's just the starting point. The language model is the brain, but a complete agent needs more. It needs tools to interact with the world, memory to maintain context, reasoning strategies to solve problems, and guardrails to stay safe. The rest of this book will show you how to build all of that around this powerful core.

Wrapping Up

Language models might seem like magic, but they're really just very sophisticated pattern-matching systems trained on enormous amounts of text. They predict the next word based on what they've learned, and that simple mechanism enables them to understand and generate human-like text.

Think of the language model as a brilliant but somewhat naive assistant. It has read everything and can talk about almost anything, but it doesn't truly understand the world the way you do. It can't check facts in real-time, do complex math reliably, or remember what you told it yesterday (unless you remind it).

That's where the rest of our agent architecture comes in. In the next chapter, we'll get hands-on and actually use a language model in code, seeing how to send it prompts and get responses. Then, step by step, we'll add all the capabilities that transform this text-predicting engine into a capable, reliable personal assistant.

The journey from "autocomplete on steroids" to "intelligent agent" is what this book is all about. And it all starts with understanding this core component: the language model that serves as our agent's brain.

Glossary

Token: A chunk of text that a language model processes, which might be a whole word, part of a word, or punctuation. Models break text into tokens before processing it, typically using a vocabulary of 50,000 to 100,000 tokens.

Training: The process where a language model learns to predict text by seeing billions of examples from books, websites, and other sources. During training, the model's parameters are adjusted to make better predictions.

Parameters: The internal values (like millions of tiny knobs) that encode everything a language model has learned. Larger models have more parameters and can typically handle more complex tasks.

Prompt: The input text you give to a language model. The model uses this as context to predict what should come next, generating its response token by token.

Hallucination: When a language model generates plausible-sounding but incorrect information. This happens because models predict based on patterns rather than looking up verified facts.

Neural Network: The mathematical structure underlying language models, consisting of layers that process input tokens. Modern language models use a specific type called transformers.

Attention: A mechanism in transformer models that helps them focus on relevant parts of the input when making predictions. For example, it helps the model connect "it" to "cat" in the sentence "The cat sat on the mat and then it moved."

Quiz

Ready to test your understanding? Take this quick quiz to reinforce what you've learned about how language models work.

Loading component...

Back to AI Agent Handbook

Previous Chapter

Language Models: The Brain of the Agent

Next Chapter

Using a Language Model in Code

Reference

BIBTEXAcademic

@misc{howlanguagemodelsworkinplainenglishunderstandingaisbrain, author = {Michael Brenndoerfer}, title = {How Language Models Work in Plain English: Understanding AI's Brain}, year = {2025}, url = {https://mbrenndoerfer.com/writing/how-language-models-work-plain-english}, organization = {mbrenndoerfer.com}, note = {Accessed: 2025-11-09} }

APAAcademic

Michael Brenndoerfer (2025). How Language Models Work in Plain English: Understanding AI's Brain. Retrieved from https://mbrenndoerfer.com/writing/how-language-models-work-plain-english

MLAAcademic

Michael Brenndoerfer. "How Language Models Work in Plain English: Understanding AI's Brain." 2025. Web. 11/9/2025. <https://mbrenndoerfer.com/writing/how-language-models-work-plain-english>.

CHICAGOAcademic

Michael Brenndoerfer. "How Language Models Work in Plain English: Understanding AI's Brain." Accessed 11/9/2025. https://mbrenndoerfer.com/writing/how-language-models-work-plain-english.

HARVARDAcademic

Michael Brenndoerfer (2025) 'How Language Models Work in Plain English: Understanding AI's Brain'. Available at: https://mbrenndoerfer.com/writing/how-language-models-work-plain-english (Accessed: 11/9/2025).

SimpleBasic

Michael Brenndoerfer (2025). How Language Models Work in Plain English: Understanding AI's Brain. https://mbrenndoerfer.com/writing/how-language-models-work-plain-english

Direct link:

https://mbrenndoerfer.com/writing/how-language-models-work-plain-english

Part of AI Agent Handbook

This article is part of the free-to-read AI Agent Handbook

View full handbook

About the author: Michael Brenndoerfer

All opinions expressed here are my own and do not reflect the views of my employer.

Michael currently works as an Associate Director of Data Science at EQT Partners in Singapore, where he drives AI and data initiatives across private capital investments.

With over a decade of experience spanning private equity, management consulting, and software engineering, he specializes in building and scaling analytics capabilities from the ground up. He has published research in leading AI conferences and holds expertise in machine learning, natural language processing, and value creation through data.

View Full Resume Publications

InteractiveHow Language Models Work in Plain English: Understanding AI's Brain