Banking & Finance Healthcare Manufacturing Legal Government & Defense How It Works Knowledge Blog About Request Demo
6 min read

How Law Firms Can Deploy Private LLMs Without Sacrificing Data Sovereignty

Private and sovereign are different claims. A model can run inside the firm and still sit under a jurisdiction that can compel the data. Here's where the gap shows up, and how to close it.

A law firm's internal network linked to a global jurisdiction boundary, illustrating data sovereignty for private LLMs

When a law firm evaluates a cloud AI tool, the question that usually comes up is whether the vendor trains on the data. That matters, and it's the easy part. The harder questions are where the data physically goes once a query is submitted, whose laws apply to it while it sits there, and who can force its production later. That's the data sovereignty problem, and a confidentiality review that stops at "they don't train on it" walks right past it.

A private LLM is the usual fix. Run the model inside an environment the firm controls and client data never leaves the network. True, and it matters. But "private" and "sovereign" are different claims, and the space between them is where firms get caught. A model can run on infrastructure a firm calls private and still sit under a jurisdiction that can reach the data.

⚖️ ABA Formal Opinion 512

The American Bar Association issued Formal Opinion 512 in July 2024. It maps existing duties onto AI use: competence (Rule 1.1), confidentiality (Rule 1.6), and supervision (Rules 5.1 and 5.3). The confidentiality piece is the one sovereignty turns on. A lawyer has to know where client information goes when it's submitted to a tool, and "goes" includes which servers hold it, which company operates them, and which legal system can demand it.

Where Client Data Goes, and Who Can Reach It

Model Rule 1.6(a) prohibits revealing information relating to a client's representation without consent, and that doesn't require anyone to intend a disclosure. Pasting a privileged memo into a cloud tool is a transmission to an outside party, whatever the vendor's terms say about training. The contract language is a policy commitment. It doesn't change where the bytes traveled or who holds them afterward.

There's a privilege angle underneath that. The third-party doctrine holds that voluntarily sharing information with an outside party can weaken or waive privilege, depending on the jurisdiction. Courts haven't settled how AI vendor access fits, and "we used a reputable vendor" is not a position anyone wants to argue in front of a judge. The architecture of keeping that data in-network is its own subject, covered in what a private LLM actually requires. Sovereignty is the layer on top. Even data that stays put answers to the law of wherever it physically rests, and to whoever operates the machine it rests on.

Sovereignty Is a Jurisdiction Question

Sovereignty covers more than keeping data off shared servers. The harder question is which government can reach it, and under what law.

A private instance running in a cloud region in Frankfurt answers to different rules than the same instance in Virginia. For matters with cross-border exposure, EU client data under the GDPR, regulated industries, or government work, the physical location of the compute matters as much as the isolation around it.

Location alone isn't the whole answer either. Under the U.S. CLOUD Act, an American provider can be compelled to produce data it controls even when the servers sit overseas. "We host in Europe" doesn't place the data beyond U.S. reach if the company operating it is American. So sovereignty has two inputs: where the data rests, and who controls the operator. On-premises hardware settles both, because the firm can name the building and the law that governs it. Most other arrangements settle one and leave the other open.

Where "Private" Stops Short of Sovereign

"Private" gets used loosely in vendor decks. Worth being specific about what each arrangement actually controls.

On-premises hardware. The model runs on servers the firm owns, inside its own space. Data physically can't leave, and no outside operator sits in the loop. Most sovereign, highest cost, and the firm's IT team or its managed provider carries the upkeep.

Private cloud instance. A dedicated, network-isolated environment inside a cloud provider. Your data stays separated from other tenants, which handles the confidentiality side. Sovereignty is only partial: the cloud operator still runs the hardware, and that operator answers to its home jurisdiction. A dedicated tenancy on a U.S. hyperscaler is still a U.S.-reachable environment.

Vendor-hosted private instance. A legal AI vendor runs an isolated deployment for the firm. Confidentiality rides on the contract and the architecture. Sovereignty rides on where the vendor runs it and who the vendor answers to, which is worth getting in writing rather than assuming.

No single model wins outright. The useful distinction is that "private" describes isolation while sovereignty describes reach, and a firm needs to know which one a given product is actually selling. For the hardware and retrieval details under each option, see what the technology actually requires.

Questions That Surface the Gaps

Most of this comes out in a handful of direct questions, asked before signing anything.

Where does inference run, by country? The query gets processed somewhere physical. Get the country, the region, and whether it can shift to another location under load.

Who is the legal operator of the environment? The entity that controls the machine is the entity a court or a government can compel. Know who that is and what jurisdiction they answer to.

Where are logs and outputs stored, and for how long? Retention stretches the exposure window. Query logs can hold the same privileged content the queries did.

Can the firm get audit logs? Supervision under Rules 5.1 and 5.3 needs records of what was submitted, what was retrieved, and what came back. If the vendor can't produce them, that's a gap you'll own.

What happens to the data on termination? Deletion timelines and return procedures belong in the contract, not in a support ticket two years later.

If the answers come back as contract language where you asked about architecture, note it. "We contractually restrict access" is a promise. "The model runs on your server, in your building" is a fact.

📊 Accuracy Doesn't Come With the Address

Keeping data in a known jurisdiction does nothing for whether the output is right. A 2024 Stanford RegLab study found that legal-specific AI tools returned incorrect information between 17% and 33% of the time. That rate is a property of how the models work, not where they run. Grounding the model in the firm's own vetted documents through retrieval brings it down. Human review closes the rest. More on that in why AI legal research still gets cases wrong.

The duty doesn't move with the hardware. Mata v. Avianca involved a cloud tool, and the lesson holds for a private one: the lawyer who signs the filing answers for what's in it, whatever produced the draft.

What This Looks Like in Practice

Firms tend to start with work that's high-volume and low-judgment, where keeping the underlying material inside a known boundary is the whole point.

Matter research and precedent lookup across the firm's own briefs and memos, with citations, without routing the query through an outside search service.

First-pass contract review against the firm's standard positions, flagging deviations for a lawyer to read before anything reaches the client.

Deposition and hearing prep over transcripts and exhibits from related matters, source material that would be risky to send anywhere outside the firm.

Document-heavy litigation support, where a production set can run to hundreds of thousands of files that should never touch external infrastructure.

The Governance Piece

A private LLM without a governing policy is just a confidential way to make mistakes. The policy has to name which tools are approved for what, who can authorize a new use, how output gets verified before it reaches a client or a court, and how staff are trained on the whole thing.

Sovereignty actually makes that policy shorter. When the data handling is settled by where the model runs, the policy can focus on use standards, verification, and supervision instead of re-arguing where client data goes for every tool on the list. What belongs in one is laid out in what belongs in a law firm AI policy.

The firms handling this well aren't waiting for the rules to harden. They're making defensible calls now and writing down the reasoning, so they can show their work when a client or a regulator asks where the data went.

Private AI That Stays Inside Your Jurisdiction

Cognetryx runs entirely inside your firm's environment. Client data stays in-network and in-jurisdiction. Queries, retrieval, and outputs land in an audit trail you control. No third-party API calls. No data egress.

Request a Demo →
Keith Kennedy

Keith Kennedy, CISSP

Founder, Cognetryx

Keith is an IT thought leader with nearly 20 years of experience architecting secure technology solutions for regulated industries. He holds a CISSP certification and advises firms that handle privileged and confidential information on secure AI architecture, data governance, and keeping client data inside the network.