What an LLM actually is (and isn't)

A large language model is a program that predicts the next piece of text, given everything that came before it. That’s the whole trick. Everything an LLM does — answering questions, writing code, summarizing contracts — emerges from doing that one prediction extremely well, billions of times in a row.

This sounds reductive, and it is. It’s also the most reliable mental model available, because every strength and every failure of these systems traces back to it. Hold onto it and the daily AI news starts sorting itself into “new wrapper around the predictor” and “actually new.” Lose it and every product demo looks like magic, which is exactly how expensive mistakes get made.

What it is

During training, the model reads a very large amount of text and adjusts billions of internal numbers until its predictions match what people actually wrote. The result is something like compressed experience: patterns of grammar, fact, style, and reasoning, all stored as statistics rather than as records.

Three properties follow directly:

It generalizes. It can write a sea shanty about tax law because it has the patterns of both, even though that document never existed. This is the genuinely new capability — recombining patterns on demand — and it’s why the same tool can draft a contract clause, debug a spreadsheet formula, and explain the clause to a customer.
It’s fluent by default. Confident, well-formed prose is the thing it’s best at — independent of whether the content is true. Fluency is the product of the training objective, not a signal of accuracy. This single fact explains most AI embarrassments you’ve read about.
It has no lookup step. Unless connected to a search tool, it isn’t checking anything. It’s recalling the shape of an answer. Sometimes the shape comes out exactly right; sometimes it’s an answer-shaped object with the wrong name, date, or number inside.

Tokens and the context window

Two terms come up constantly, and both fit the prediction frame. The model reads and writes in tokens — chunks of a few characters, roughly three-quarters of a word in English. The context window is how many tokens it can consider at once: its working memory. Everything you paste into a conversation lives in the context window; everything from training lives in the frozen statistics. The practical difference matters. Text in the window can be quoted exactly. Knowledge from training can only be reconstructed — and reconstruction is where errors slip in.

That’s why pasting a document and asking about it is so much more reliable than asking the model what it “knows” about the same document. In the first case it’s reading; in the second it’s remembering, in the loosest possible sense of the word.

What it isn’t

Most expensive mistakes with AI come from one of these four misconceptions:

It isn’t a database. There’s no table of facts inside. Asking for an exact quote, a citation, or a number is asking it to reconstruct one from patterns — sometimes correctly, sometimes not, with the same confident tone either way.
It isn’t a calculator. Arithmetic is a side effect of text patterns, not a built-in operation. Good tools route math to actual calculators; if yours doesn’t, treat any computed figure as a guess.
It isn’t consistent. Ask the same question twice and you may get different answers. There’s deliberate randomness in how it picks each next word, which is part of why the writing feels alive — and why “but it told me X yesterday” is not an argument.
It isn’t aware of itself. Asking a model why it said something produces a plausible explanation, not an inspection of its own internals. The explanation is one more prediction.

Treat it as a brilliant, well-read colleague with no access to records and no memory of yesterday — and you’ll use it correctly on the first try.

Where the knowledge lives

A model’s knowledge has a cutoff: the date its training data ends. Anything after that date arrives only through tools — search, file uploads, databases — bolted on around the model. When a product seems to know today’s news, you’re seeing the bolt-ons, not the model.

The bolt-ons have a family name worth knowing: retrieval. The system searches a source — the web, your files, a company wiki — and pastes what it finds into the context window before the model answers. Done well, this converts “remembering” into “reading” and reliability jumps. Done badly, it pastes in the wrong passage and the model fluently summarizes an irrelevant document. Either way, the model itself didn’t get smarter; its inputs got better or worse.

The same applies to memory features. When an assistant “remembers” your preferences across sessions, a tool is storing notes and re-pasting them into the window later. Useful, but it’s filing, not learning. The underlying model is frozen between releases.

Why this model of the model holds up

Architectures change every year; the prediction framing hasn’t. Models in 2026 plan further ahead, use tools mid-answer, and verify some of their own outputs — but each of those is a system built around the predictor, not a replacement for it. When you read about a new release, the productive question is always the same: what did they wrap around the prediction engine this time?

The wrappers are genuinely important. A predictor that can call a calculator stops fumbling arithmetic. A predictor wired to search stops guessing at dates. A predictor that drafts an answer, critiques it, and redrafts catches a real share of its own mistakes. Capability gains are real — but they’re gains in the system, and they inherit the system’s failure modes. A search tool can fetch the wrong page. A verifier can approve a wrong answer that looks right. The fluent-but-wrong risk never goes to zero; it moves around.

What this means for daily use

The mental model cashes out as a short set of habits:

Give it material, not trust. Paste the document, the data, the email thread. Work it can read beats work it must recall, every time.
Verify anything that leaves the building. Names, numbers, quotes, citations, legal and medical claims — check them against a source before a customer, a regulator, or your accountant sees them.
Use it where drafts are cheap and judgment is yours. First drafts, summaries, options, explanations, code you’ll test — the predictor excels exactly where a wrong answer costs you a minute, not a client.
Don’t ask it to grade itself. “Are you sure?” produces a confident yes or a polite retraction, both generated the same way as the original answer. Verification needs a source or a test, not a vibe.

None of this is cynicism about the technology. Prediction at this scale is a remarkable, genuinely useful capability — it’s just a different kind of machine than the record-keeping software your business already runs on, and it earns its keep fastest for the people who stop expecting it to be one.