Solutions How It Works Knowledge About Free Assessment
7 min read

What “Zero-Hallucination” Really Means When an AI Answer Has to Hold Up

Even the AI tools built just for legal research still get 17 to 33 percent of answers wrong. Better models will not move that number. The cause sits one layer deeper, in how the system around the model is put together, and that distinction decides whether a deployment will hold up when an examiner starts asking questions.

AI response tied to a real source page in the side pane
Every answer the platform gives is tied to the page it came from. The link sits next to the answer, ready to open.

Most talks about AI end with the same quiet question. The person asking is usually the one whose name goes on the report, the call, or the filing. They want to use AI. They cannot afford to be wrong about what it told them. The question they ask is simple: how do I know the answer is right?

That question has only gotten harder in the last two years. In May 2024, researchers at Stanford’s RegLab and Human-Centered AI Institute ran a test on the AI tools sold to law firms by LexisNexis and Thomson Reuters. These products cost real money. Their vendors promise grounded answers built on real cases. The study found those tools still returned wrong answers on 17 to 33 percent of queries.

One in six. Sometimes one in three. From the AI tools made just for legal work. With fake case names. With wrong page numbers. With smooth answers about laws that say something different from what the AI claimed.

📌 What “hallucination” means in plain words

A hallucination is a clear, smooth, well-written AI response that does not appear in the source material. A made-up case name. A rule that was never written. A neat paragraph pointing to a page that does not exist. The answer looks fine until someone checks it. In work where the check comes from a regulator, the check arrives too late.

Where the problem actually lives

The instinct is to blame the AI itself. Bigger models, longer memory, smarter prompts: those interventions will not move the number. The cause sits in the system architecture around the model.

Most office AI follows the same pattern. There is a database that holds chunks of your text. There is a language model that writes the answer. There is a thin layer of code that ties them together. The demo runs fast. The failure mode is built right in.

Here is the order of operations. A user asks a question. The system finds chunks of text that look like the question. The model writes an answer. The system attaches a link at the end. The model produces fluent prose first and searches for matching proof second. If the proof is thin, the model still writes the answer. That behavior is what generative models are trained to do, and the implementation-tax piece traces the same gap from demo through production.

Why most AI citations do not count

A link earns trust only when it forces the answer to match the source page. Most office AI lacks that constraint. The model writes the response. A search step finds the chunk that looks closest. A footnote-style link gets attached. The link may back up the answer. The link may point to a related topic. The link may point to a page with the opposite claim. The system has no way to tell the difference.

People who do this work for a living spot the problem quickly. An official-looking link that fails to match the answer puts the work of checking back on the reader while creating the appearance that the checking has been done. The Stanford team measured exactly this dynamic. Their finding was that the citations in those tools could not be trusted without a second pair of eyes on every answer. Compliance officers feel the strain first, because they are the ones who have to defend the answer when the regulator asks.

The thing that matters

The ordering of operations is what matters. When verification runs before generation, and when the model is constrained to material the verification step actually returned, fabrication has nowhere to live. The architectural commitment to that ordering is what separates a tool that demos well from a tool that holds up in production.

How Cognetryx orders the work

Cognetryx finds the source first. The system checks what the user is allowed to see. Only then does the model write anything. The model never sees pages outside the user’s permissions, because the user’s identity is part of the search itself.

The architecture rests on three design choices working together. Retrieval and permission checks run before generation, so the model only sees material the user is cleared to see. A knowledge graph built during document ingestion maps how your files relate to each other, so the platform can answer questions that span multiple sources. And every response is bound at the model interface to the specific passages that support it, so the answer cannot drift away from what the sources actually say. The result a leader can verify in any interaction is the same: every answer carries a link to the specific passage that backs up the claim. The link points to the page, and to the part of the page, that supports the response.

Because the knowledge graph is built on your own files, a question like “what is our stance on this kind of case” draws from your real cases, your real policies, and your real past calls. The answer comes from your own institution’s history, with no reach into generic training material about what banks or hospitals or law firms tend to do. For a closer look at the design, the how it works page walks through it.

The question your team has to answer is whether the AI’s response points to a page you can read, check, and hand to an examiner. The architecture is what makes that question resolvable with a click.

What the platform does when no source fits

New users in a demo are often caught off guard by this part. When the platform has no real page that answers a question, the system returns a clear pass. You asked this. We searched. Nothing in your sources fits. Here is what we can say, and here is what we cannot.

Every refusal is logged with the same fidelity as every answer. Every question, every search, every reply, every “we don’t know” gets saved and can be exported as a record. That helps the analyst, because the platform tells them clearly when they need to do the work themselves. It helps the compliance officer, because the record captures everything the AI said and every time the AI was asked something and held back.

A clean “we don’t know” is its own form of accuracy. In work that goes to a regulator, an honest “the source does not say that” carries real value. It tells the analyst exactly where to look next, and it leaves no fluent paragraph for an examiner to later prove wrong. Cloud AI tuned for helpfulness above all will tend to fill the space anyway. The output that fills the space is what we call a hallucination.

What this changes for a decision-maker

The reason this matters at the top of the building, and beyond the IT room, is that it changes what AI is allowed to be used for. When every answer ties back to a real page, three things happen at once.

That last point gets missed in vendor talks. Most AI failures in regulated work trace back to a trust gap between the system and the people who use it, more so than to a failure of the technology in a controlled test. The MIT GenAI Divide report describes the same dynamic: tools that demo well still stall in the workflow when staff cannot fully trust them. Architectural trust changes that picture. When the system itself enforces the rule that every answer ties to a verifiable source, the staff who would otherwise hold back have a reason to engage. Adoption follows the trust, and the trust is structural.

We talk about zero-hallucination as a property of the architecture, more than as a marketing line, because we have watched the alternative play out. A system that gets a SAR write-up wrong once, a clinical note wrong once, a contract line wrong once, a board memo wrong once: that system gets quietly turned off. The reason has more to do with what comes next than with the mistake itself. The people who sign the work need a tool they can stand behind, and a tool that might be wrong in ways they will not catch in time fails that bar. Cognetryx is built so the question of unseen mistakes is closed at the architecture level. Every answer arrives with a verifiable link to its source. Every refusal is logged with the same fidelity as any answer. The people responsible for the work have what they need to defend it, in either case.

See What This Looks Like on Your Own Data

Cognetryx runs inside your own network and ties every AI answer to the page in your own files that supports it. Your data stays behind your firewall, every response carries a verifiable source, and your team has the audit trail to defend any output the platform produces. Bring your own files and see it run.

Book a Free AI Strategy Assessment →
Brent Fisher

Brent Fisher

Co-Founder & Head of Go-to-Market, Cognetryx

Brent spent twenty years in community banking and marketing for regulated industries before co-founding Cognetryx. He works with leadership teams, boards, and decision-makers on the part of the AI conversation that opens once the demo ends and the question of trust takes over. He has seen the documentation burden and the cost of slow answers from inside a real institution, and he brings that view to every conversation on the vendor side.

See how Cognetryx is built for answers you can check and defend. Explore how it works →