0 % vs 40 %: making a RAG agent refuse to hallucinate¶

2026-05-31 · LLM / RAG

A retrieval-augmented agent is only as trustworthy as its behaviour on questions whose answer isn't in the corpus. The failure mode is quiet: instead of saying "I don't know," the model invents a confident, well-formed, wrong answer. This post shows a single guardrail that takes that from common to never — and, crucially, measures it.

Reference architecture: nim-agent-blueprint — agentic RAG on the NVIDIA NIM stack with a built-in eval harness.

The ablation¶

The agent loop is plan → retrieve → generate → validate. The interesting variable is the generation prompt's contract with the retrieved context:

Configuration	Out-of-corpus hallucination rate
Generate freely from context	~40 %
Guarded prompt (answer only from context; otherwise abstain)	0 %

Same model, same retriever, same questions. The only change is a prompt that makes "I can't answer that from the provided sources" a first-class, rewarded output — plus a validate step that checks the answer is grounded in retrieved spans before returning it. On in-corpus questions, retrieval recall@3 stayed at 100 %, so the guardrail buys safety without costing coverage.

Why "just prompt better" isn't the lesson¶

The lesson isn't the prompt — it's that the difference between 40 % and 0 % is invisible without an eval harness. A demo that only asks in-corpus questions looks perfect in both configurations. You only see the 40 % when you deliberately ask things the corpus can't answer and score groundedness. So the blueprint ships with:

retrieval hit-rate (is the answer even retrievable?),
answer groundedness via LLM-as-judge (is the answer supported by what was retrieved?),
latency, and OpenTelemetry traces per agent step.

That's the difference between "it works on my five questions" and "here is the number a partner can hold me to."

Takeaway¶

For enterprise RAG, abstention is a feature, not a failure. Make "I don't know" a rewarded output, validate groundedness before returning, and measure the out-of-corpus rate — it's the number that separates a demo from something you'd put in front of a customer.

→ Runnable blueprint + eval harness: github.com/waynehacking8/nim-agent-blueprint