Hallucinations: when they matter and when they don't

Hallucinations aren't a bug. They're not a flaw that will get fixed in the next model release. They're the nature of how this technology works, and understanding that changes how you work with it.

A language model is a probability engine. Given what's come before, what's the most likely next token? It's not retrieving facts from a database. It's not looking things up. It's generating plausible continuations based on patterns it learned in training. Sometimes those continuations are accurate. Sometimes they're not. The model doesn't know the difference, and in a meaningful sense, it doesn't care. It's trying to be helpful, to complete the pattern, to give you something that feels right.

This isn't a criticism. It's just the mechanism. And once you understand the mechanism, you can work with it properly.

The day to day

In practice, whether hallucinations matter depends entirely on what you're doing.

Brainstorming? They're not just tolerable, they're the point. That off-piste suggestion, the connection you wouldn't have made, the idea that sounds slightly wrong but sparks something useful. That's what you want. A model that only said verifiably true things would be useless for creative work. The value is in the unexpected leaps, not the accuracy.

Drafting? They matter a bit. You need to read what comes back with a critical eye, catch the things that sound right but aren't, verify anything that's going to go out into the world. But this is just good practice. You'd do the same with a human collaborator.

Research? They matter more. If you're relying on the AI to surface information you don't already know, you need to be careful. Cross-reference. Check sources. Treat the output as a starting point, not an answer.

The skill isn't eliminating hallucinations. It's knowing, for any given task, how much accuracy matters and calibrating your trust accordingly. That judgement is on you. The AI can't make it for you.

The transactional shift

Everything changes when work becomes transactional.

The moment there's money involved, or bookings, or actions that can't easily be undone, "probably right" stops being good enough. A chatbot that occasionally invents a product feature is annoying. A booking system that occasionally invents availability is a disaster.

This is where most AI projects run into trouble. The technology that works brilliantly for conversation and content starts to creak when you need it to do things in the real world. Not because the model got worse, but because the stakes got higher.

When a customer asks "is this room available on the 15th?" the answer needs to be actually true. Not plausibly true. Not pattern-completion true. Actually true, grounded in real data, verifiable against a source of record.

The grounding problem

This is the problem that grounding solves. Instead of asking the model to generate an answer from its training, you give it access to real data and ask it to reason about that data. The model's job shifts from "make something up that sounds right" to "work with what's actually there."

We work in travel, and grounding is everything. The difference it makes is hard to overstate. Something as simple as a postcode or an address, the kind of thing you'd assume any AI would get right, is night and day different when you give the model access to search.

Without grounding, it'll give you something plausible. A postcode that looks right, in roughly the right format, for roughly the right area. Confident, fluent, wrong. With web search tools to ground against reality, it checks. It verifies. It gives you the actual postcode, the actual address, the actual thing.

The invention of web search as a tool for AI was one of the biggest practical leaps for this kind of work. Not because search is clever, but because it anchors generation to reality. Retrieval Augmented Generation, RAG, does something similar with your own data. You retrieve relevant information from a knowledge base and include it in the context, so the model is reasoning about real facts rather than inventing plausible ones.

Done well, grounding dramatically reduces hallucinations for factual queries. Done badly, it just adds a different set of problems. But that's a separate conversation.

The calibration

I think of it as a spectrum. On one end, pure generation: creativity, brainstorming, drafting, exploration. Hallucinations are features, not bugs. On the other end, pure transaction: bookings, payments, actions, commitments. Hallucinations are unacceptable.

Most work sits somewhere in between, and the skill is knowing where. A first draft of an email? Closer to generation. The pricing quote in that email? Closer to transaction. Same email, different calibrations.

The people who struggle with AI are often the ones who haven't made this calibration explicit. They either trust everything, which leads to embarrassing mistakes, or trust nothing, which makes the technology useless. The middle path is knowing what you're doing and adjusting accordingly.

Hallucinations aren't going away. They're not meant to. The question is whether you understand them well enough to work with them anyway.

Hallucinations: when they matter and when they don't

The day to day

The transactional shift

The grounding problem

The calibration

Related Insights

Prompt engineering: the skill that won't stay still

Fine-tuning: when you need it & when you don't

When AI is the wrong answer