Preventing Hallucinations: Buidling More Honest AI Agents

Your AI agent confidently fabricates a non-existent API endpoint. It cites a research paper that was never written. It gives you three different, incorrect birth dates for the same person.

These hallucinations aren’t a mysterious glitch. They are a predictable outcome of how we train and evaluate large language models. As an industry, we’ve inadvertently built systems that are incentivized to guess rather than admit uncertainty. The good news is, as developers, we can directly counteract this.

The Root Cause: We Reward Guessing

In a recent paper, OpenAI confirmed what many in the community have suspected: language models hallucinate because our standard procedures reward them for it. Most evaluation benchmarks and leaderboards are like multiple-choice tests with no penalty for wrong answers. A model that guesses on a question it doesn’t know might get lucky and score a point. A model that honestly says, ‘I don’t know,’ is guaranteed a zero.

Over thousands of questions, the guessing model climbs the leaderboards, creating a powerful incentive for developers to optimize for accuracy above all else—even at the cost of honesty.

This problem is compounded by the fundamental nature of LLMs. They are stochastic language models, not truth engines. Their primary function is to predict the next most plausible word based on patterns in their training data. They model language, not reality. When a ‘truthy’ response is statistically likely, we get a useful answer. When it’s not, the model still generates the shape of a plausible answer, filling in the blanks with fabrications.

Your First Line of Defense: The Power of Permission

While fixing industry-wide evaluation metrics is a long-term goal, you can make an immediate impact. Your agents hallucinate because their training has conditioned them to provide an answer, no matter what. You must explicitly give them permission to be uncertain.

Your agents already know how to say ‘I don’t know.’ You just need to tell them it’s not only acceptable but required.

This is where clear, direct system prompts become your most powerful tool. Instead of just defining a role, build behavioral guardrails into your agent’s core instructions.

Basic Directive: The Uncertainty Clause

Start by adding a simple directive to your system prompts:

If you do not know the answer to a question or are uncertain about a fact, you must state that you do not have the information. Do not invent facts or make up information.

This single instruction creates a new behavioral pathway, giving the model an explicit alternative to confabulation.

Advanced Directive: Mandate Verification

For more critical tasks, raise the standard from admitting uncertainty to actively verifying information. This is especially crucial in multi-agent systems where one agent’s hallucination can cascade through the entire workflow.

Consider an agent responsible for summarizing financial reports:

You are a Financial Analyst Agent. Your task is to extract key figures and summarize quarterly earnings reports. For every financial metric you state (e.g., revenue, net income, EPS), you must cite the exact page and section of the source document. If a figure cannot be verified in the provided document, you must explicitly state: ‘[METRIC] could not be verified in the source document.’

This directive doesn’t just prevent hallucinations; it forces the agent to ground its output in a verifiable source, creating a chain of accountability.

The Precision vs. Recall Trade-Off

Implementing these directives involves a conscious design choice. You are tuning your agent for higher precision (the answers it gives are more likely to be correct) at the expense of recall (it may answer fewer questions overall).

An agent that frequently says ‘I don’t know’ may seem less capable at first glance. But a reliable agent that knows its own limits is infinitely more valuable than a confident liar. As developers, our goal isn’t to build an AI that seems omniscient, but one that is demonstrably trustworthy within its domain.

Building a More Honest AI

Hallucinations are not inevitable. They are an artifact of an incentive structure that we are only now beginning to dismantle. By giving our agents clear directives to admit uncertainty and verify their claims, we take a critical step toward building more reliable, honest, and ultimately more useful AI systems.

It starts with a simple instruction: give your agent permission to say, ‘I don’t know.’