Industry Solutions Banking & Finance Healthcare Manufacturing Legal Government & Defense How It Works Knowledge About Request Demo
7 min read

Why AI Legal Research Still Gets Cases Wrong

Even the AI built specifically for legal research is wrong often enough to matter, on roughly one in six queries by the best measure. Here is why it happens, what your duty of competence requires, and what actually lowers the risk.

A lawyer checking an AI-generated citation against the actual case before filing
The tools are useful. They are also wrong often enough that a confident answer cannot be the last word. The gap between those two facts is where the risk lives.

The pitch for AI in legal research is speed. Ask a question in plain English, get an answer with citations in seconds. The pitch is real. So is the catch. The answer is wrong often enough that it cannot be trusted on its own, even when it comes from a tool built specifically for lawyers.

In 2024, researchers at Stanford ran the first careful test of the leading legal research tools and found they produced incorrect or poorly grounded answers between 17 and 33 percent of the time. That is roughly one wrong answer in six for the better tool, and closer to one in three for another. These were not general chatbots. They were the paid, purpose-built products, and several vendors had marketed them as “hallucination-free.” The study’s plain conclusion was that those claims were overstated.

For a lawyer, that error rate is not an abstraction. It is the difference between a brief that holds up and one that earns a call from the judge.

⚖️ What “hallucination” means here

It is not only the dramatic case of an invented decision. It includes citing a real case for a holding it does not contain, mixing up which court decided what, or summarizing a statute that has since been amended. The answer reads as authoritative. The error is buried in a detail that only verification catches.

Why better models will not fix this

The instinct is to wait for a smarter model. That misreads the problem. Most legal AI tools already connect the model to a real database, an approach called retrieval-augmented generation, so the model is meant to answer from real sources rather than from memory. The Stanford results are from tools that already work this way. They still hallucinate.

The reason is that accuracy depends on the whole system, not just the model. The tool has to find the right passage, recognize the controlling authority, and represent it correctly, and then the model has to phrase the answer without drifting from the source. A failure at any step produces a confident, wrong answer. This is the same point we make in our piece on what “zero-hallucination” really means. The reliability comes from how the system is built around the model, not from the model alone.

The cost of trusting the output

In Mata v. Avianca (2023), a New York federal judge fined two attorneys and their firm $5,000 after they filed a brief built on six cases ChatGPT had invented, then stood by them when the court asked. It was the first widely reported case. It was not the last. A public database maintained by researcher Damien Charlotin now tracks hundreds of court decisions involving AI-fabricated citations, with the pace rising from a couple a week to several a day during 2025. Sanctions have reached five figures, and courts have started faulting lawyers who failed to catch the other side’s fake citations too.

The duty does not move to the tool

Bar regulators have been clear that the responsibility stays with the lawyer. ABA Formal Opinion 512, issued in 2024, reads the duty of competence under Model Rule 1.1 to require that lawyers understand the limits of the AI they use. The duty of candor under Rule 3.3, and Rule 11 of the Federal Rules of Civil Procedure, put the lawyer who signs a filing on the hook for what is in it. “The software made it up” is not a defense. It is a description of the thing the lawyer was supposed to catch.

So verification is not optional, and a better tool does not remove it. Every citation an AI produces has to be confirmed against the real source before it goes to a client or a court. The practical question is how to make that step fast enough that people actually do it.

What actually lowers the risk

A tool can make verification easy or hard. An answer that arrives with a citation an attorney can open and read in one click is fast to check. A fluent paragraph with no source behind it is not, and it quietly invites the reader to trust it. So the first thing that lowers risk is grounding: answers tied to specific, authoritative sources that show their work.

The second is what the tool is allowed to use. AI grounded in your own materials, your matters, your precedent, your prior work, pulls from a body of text you trust, not the open internet. When that AI runs on a private platform inside the firm, the sources behind every answer are ones you control and can audit, which makes verification shorter and the result easier to stand behind.

The third is the part no tool removes. A person checks the work. The honest framing is that no system is perfectly free of error, so the goal is not a magic box that never misses. It is answers that are traceable and quick to verify, used by attorneys who still verify them. That is a workflow a firm can defend, and it is the one ABA Opinion 512 effectively describes.

See How Cognetryx Grounds Answers in Sources You Can Check

Cognetryx runs inside your firm’s network and answers from your own approved documents, with a citation an attorney can open and verify for every claim. The verification step stays with your team, and the tool is built to make it fast.

Book a Free AI Strategy Assessment →
Keith Kennedy

Keith Kennedy

Founder & CEO, Cognetryx (CISSP)

Keith founded Cognetryx to build private AI that regulated institutions can stand behind. He writes about the architecture of trustworthy AI, where grounding, retrieval, and verification decide whether an answer holds up.

What Lawyers Ask About AI Research Accuracy

How often does AI get legal research wrong? +

A 2024 Stanford RegLab study tested the leading purpose-built legal research tools and found they produced incorrect or misgrounded answers between 17 and 33 percent of the time. That is roughly one in six queries for the better tool and closer to one in three for another. General chatbots not built for law do worse. The tools are useful; the catch is that a confident answer is not the same as a correct one.

Why do legal AI tools still hallucinate if they use real databases? +

Connecting a model to a real database, often called retrieval-augmented generation, helps but does not eliminate errors. The system can pull the wrong passage, miss the controlling authority, or summarize a real case incorrectly, and the model can still phrase a wrong answer with full confidence. Accuracy depends on the quality of the retrieval and the grounding, not just on having a database attached.

Can lawyers be sanctioned for AI hallucinations? +

Yes. Courts have sanctioned lawyers for filing briefs built on cases AI invented, beginning with Mata v. Avianca in 2023 and accelerating sharply through 2025. A public database tracks hundreds of such decisions worldwide. Rule 3.3 (candor to the tribunal) and Rule 11 of the Federal Rules of Civil Procedure put the responsibility on the lawyer who signs the filing, not on the tool.

Does the duty of competence require checking AI output? +

Yes. ABA Formal Opinion 512 (2024) reads the duty of competence under Model Rule 1.1 to mean a lawyer must understand the limits of the AI tools they use, and the duties of supervision and candor mean a lawyer must review AI output before relying on it. In practice that means confirming every citation against the actual source before it goes to a client or a court.

How can a firm lower AI hallucination risk? +

Use AI that is grounded in authoritative sources and that cites where each answer came from, so an attorney can open the source and check it. Keep the system inside an environment where it draws on your own vetted materials. And keep a human verification step in the workflow. No tool is perfectly hallucination-free, so the goal is answers that are traceable and quick to check, paired with a person who checks them.

See how Cognetryx grounds answers in your own approved sources, with a citation an attorney can open and check. Explore private AI for legal →

Sources

  1. Stanford HAI, “AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries,” 2024. hai.stanford.edu
  2. Stanford RegLab, “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” 2024. reglab.stanford.edu
  3. Mata v. Avianca, Inc., 678 F. Supp. 3d 443 (S.D.N.Y. 2023). law.justia.com
  4. Damien Charlotin, AI Hallucination Cases Database. damiencharlotin.com
  5. American Bar Association, Formal Opinion 512 (Generative AI Tools) and Model Rules 1.1 and 3.3, 2024. americanbar.org