The pitch for AI in legal research is speed. Ask a question in plain English, get an answer with citations in seconds. The pitch is real. So is the catch. The answer is wrong often enough that it cannot be trusted on its own, even when it comes from a tool built specifically for lawyers.
In 2024, researchers at Stanford ran the first careful test of the leading legal research tools and found they produced incorrect or poorly grounded answers between 17 and 33 percent of the time. That is roughly one wrong answer in six for the better tool, and closer to one in three for another. These were not general chatbots. They were the paid, purpose-built products, and several vendors had marketed them as “hallucination-free.” The study’s plain conclusion was that those claims were overstated.
For a lawyer, that error rate is not an abstraction. It is the difference between a brief that holds up and one that earns a call from the judge.
It is not only the dramatic case of an invented decision. It includes citing a real case for a holding it does not contain, mixing up which court decided what, or summarizing a statute that has since been amended. The answer reads as authoritative. The error is buried in a detail that only verification catches.
Why better models will not fix this
The instinct is to wait for a smarter model. That misreads the problem. Most legal AI tools already connect the model to a real database, an approach called retrieval-augmented generation, so the model is meant to answer from real sources rather than from memory. The Stanford results are from tools that already work this way. They still hallucinate.
The reason is that accuracy depends on the whole system, not just the model. The tool has to find the right passage, recognize the controlling authority, and represent it correctly, and then the model has to phrase the answer without drifting from the source. A failure at any step produces a confident, wrong answer. This is the same point we make in our piece on what “zero-hallucination” really means. The reliability comes from how the system is built around the model, not from the model alone.
In Mata v. Avianca (2023), a New York federal judge fined two attorneys and their firm $5,000 after they filed a brief built on six cases ChatGPT had invented, then stood by them when the court asked. It was the first widely reported case. It was not the last. A public database maintained by researcher Damien Charlotin now tracks hundreds of court decisions involving AI-fabricated citations, with the pace rising from a couple a week to several a day during 2025. Sanctions have reached five figures, and courts have started faulting lawyers who failed to catch the other side’s fake citations too.
The duty does not move to the tool
Bar regulators have been clear that the responsibility stays with the lawyer. ABA Formal Opinion 512, issued in 2024, reads the duty of competence under Model Rule 1.1 to require that lawyers understand the limits of the AI they use. The duty of candor under Rule 3.3, and Rule 11 of the Federal Rules of Civil Procedure, put the lawyer who signs a filing on the hook for what is in it. “The software made it up” is not a defense. It is a description of the thing the lawyer was supposed to catch.
So verification is not optional, and a better tool does not remove it. Every citation an AI produces has to be confirmed against the real source before it goes to a client or a court. The practical question is how to make that step fast enough that people actually do it.
What actually lowers the risk
A tool can make verification easy or hard. An answer that arrives with a citation an attorney can open and read in one click is fast to check. A fluent paragraph with no source behind it is not, and it quietly invites the reader to trust it. So the first thing that lowers risk is grounding: answers tied to specific, authoritative sources that show their work.
The second is what the tool is allowed to use. AI grounded in your own materials, your matters, your precedent, your prior work, pulls from a body of text you trust, not the open internet. When that AI runs on a private platform inside the firm, the sources behind every answer are ones you control and can audit, which makes verification shorter and the result easier to stand behind.
The third is the part no tool removes. A person checks the work. The honest framing is that no system is perfectly free of error, so the goal is not a magic box that never misses. It is answers that are traceable and quick to verify, used by attorneys who still verify them. That is a workflow a firm can defend, and it is the one ABA Opinion 512 effectively describes.
See How Cognetryx Grounds Answers in Sources You Can Check
Cognetryx runs inside your firm’s network and answers from your own approved documents, with a citation an attorney can open and verify for every claim. The verification step stays with your team, and the tool is built to make it fast.
Book a Free AI Strategy Assessment →