AI outperforms law professors in Stanford Law study

179 points • by berlianta • yesterday at 11:43 PM • 144 comments • view on HN

https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

Comments

I find this study quite suspect. I'd have to dive deeper but there's definitely significant alarm bells that should be going off for anyone reading.

Figure 2 (page 6) screams problems. There's only 16 professors (3k comparisons each?!?!) and the professors are all over the place. That's very high variance, suggesting the study has no meaningful statistical power. Poor instructor 16 can't catch a break lol

There's also really clear bias given that the main results only feature Google models. Other models show up elsewhere, why not there?

I'm no lawyer, but I'm a pretty competent statistician and can confidently say this paper has a smell to it. I can't call it bullshit, but there are red flags all over

➕ show 3 replies

causal • today at 12:46 AM

As a software engineer I have some intuition for what the risks are of letting agents do some tasks vs others.

I don't have a similar intuition calibrated for what could go wrong when asking AI to draft a legal document. Some things seem harmless, i.e. drafting a will, but I don't really know- our legal system is notoriously rife with footguns.

➕ show 14 replies

finnborge • today at 2:55 AM

I understand why the conversation on this article looks like it does, but the study is specifically focused on the potential for LLMs to operate as tutors for law students. I enjoy the extrapolation out to whether LLMs will replace lawyers, but did not find that to be discussed in the study itself.

In the framing of using LLMs as legal tutors, with the implication of lowering the cost of legal training, this seems like a socially-positive outcome. Furthermore, it feels kind of intuitive to me that any contemporary system operating with an LLM and access to legal reference material will be prepared to answer _student-originated questions_ comprehensively and with breadcrumbs or direct references to educational/source materials, as seems to have been found in the study.

The authors explicitly and intentionally emphasize that many legal questions require contextualization, as opposed to some discrete calculated answer. The result of the study implies that the LLM-based systems were capable of using what many of us here understand to be the "stochastic best-fit algorithmic generation" of a contemporary language model to adequately contextualize a student's question, providing insight into the trade-offs or complications implicit in the question, while then, critically, _meeting the professional standards of legal educators in explaining that complexity to a student_.

Realistically, I would hope this provides some confidence to readers of HN that they can actually ask a legal question to an LLM and expect the response will explain the complexity of the law in relation to the question. This is great news, and is likely the minimal pre-work any of us should do before actually consulting a lawyer, if time permits.

On the other hand, I do _not_ think that this study provides any indication that an LLM is prepared to actually provide direct legal counsel. Possibly in the same way that a legal textbook does not replace legal counsel, or perhaps more accurately, the same way that stumbling upon a legal case study for approximately the same situation you're in doesn't guarantee you'll have the same result.

quantisan • today at 2:14 AM

I'm surprised Stanford Law would go along with this over-reaching press release title. How about "For common first-year contracts-law questions, law professors preferred AI-generated answers to professor-generated answers"

➕ show 1 reply

chewbacha • today at 1:07 AM

My best guess is that Gemini was trained on the textbooks that the questions are meant to test against, thus they are probably better at explicit recall of those questions or related questions.

This is a pretty limited introductory course based on what it says in the methods of the paper itself.

➕ show 1 reply

rockskon • today at 3:30 AM

I do question at what point AI could be useful as a teaching aid.

The quality of LLMs depends heavily on, among other things, how you word your questions.

Knowing the correct questions to ask is not something most students know how to do given that it tends to require a fair bit of pre-existing domain knowledge.

applicative • today at 2:09 AM

What the LLM cannot do is explain why it said what it said, when cross-examined. It simply hallucinates the best account of why someone would have said such a thing as it said, same as it can give a probable account of why someone else said something different. The question 'But why did you say this not that ...?' does not lead it to make explicit its grounds for what it said, but just to make a new more complicated statement.

➕ show 4 replies

mchl-mumo • today at 5:59 AM

16 is such a small number for what they phrase as an important finding. It really couldn't be much harder to coordinate with 100+ professors.

teiferer • today at 5:57 AM

Question is: if a legal question is answered incorrectly by an LLM, who is going to be held responsible?

epicureanideal • today at 3:06 AM

One way to make legal services more affordable and accessible would be to put the burden of ensuring the AI legal services are accurate on a private-public partnership with the government.

If a person using the service is given inaccurate legal advice and acts on that advice, the person can't be charged with a crime, can't be given any civil penalties, etc., as long as the law in question is non-obvious.

Obviously if by some exploit, some fundamentally obvious crime (murder, theft, obvious fraud, etc.) is said to be legal, that wouldn't apply, but of course the service should try to prevent those kinds of exploits anyway.

Could limit this to something like business regulations to begin with, or even specifically for small businesses, or contracts within some time limit and dollar amount that would otherwise be coverable by small claims court, etc.

throw7 • today at 1:10 AM

Oh, a "Human-Cented" study by AI lover:

Julian Nyarko

    Professor of Law
    Co-Chair Stanford Law AI Initiative
    Senior Fellow, Stanford Institute for Human-Cented AI (HAI)

LOL!

flanked-evergl • today at 6:05 AM

Will the AI also outperform law professors at applying two tier justice and ensuring criminal have no consequences as long as they are brown or black? Or do we still need humans for that?

➕ show 1 reply

KnuthIsGod • today at 2:01 AM

In the hands of a domain expert, AI is useful. In the hands of the naive, it is a foot gun.

I killed my Arch installation and was stuck at the GRUB prompt.Unwilling to brush up my rusty knowledge of GRUB syntax, I asked Gemini for help. The commands Gemini suggested would have wiped my hd...

Once Gemini was told that I was using BTRFS, the suggestion from Gemini looked a bit more sane, but still looked incorrect to me.

It was only after I informed Gemini that I was using a NMVE with BTRFS that it finally produced a sane command.

vessenes • today at 4:18 AM

* Gemini 2.5 Pro (no outside resources), and * NotebookLM (not versioned -- with added legal resources).

NotebookLM was considered slightly better than 2.5 Pro by the evaluators.

eichi_uehara • today at 2:02 AM

I beat lawyers twice before generative AI even existed. Recently I asked Gemini a few questions about personal conflicts in everyday life. It's often too conservative, with views too shallow for the problem. So I still handle human conflicts myself. I only outsource the templated stuff like routine chat replies or marketing copy though it saves me huge amount of time. People who quote AI in serious conflicts are too weak to handle them on their own.

Aperocky • today at 1:56 AM

> rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

That's the problem, you never know when the 25% deliver a true stink bomb, and that's not considering prompting - while a fair prompt/question maybe considered objective, it's very easy to stray.

airstrike • today at 1:17 AM

Yes, LLMs are great at search. That's not news.

Esophagus4 • today at 12:58 AM

Yeah this could be interesting. A lot of the spotlight has been on “law firm stuff” like demand letters and writing contracts…

But imagine if a dev team didn’t have to go engineer -> product manager -> legal team to get a question answered on local data retention requirements. You could ship that much faster.

➕ show 1 reply

galaxyLogic • today at 2:27 AM

I'm going to need some legal help for my startup. But I can't pay much. So I figured I will ask AI all relevant questions, as well as forms filled etc. Perhaps even create a patent-application for me.

THEN I find a human lawyer and give AI's answers to them and say "Can you find any errors in this? Can you improve it?" .

That way I think my legal bills should be smaller because the AI has already done most of the work. What do you think? Which LLM is best for legal work?

➕ show 3 replies

wilg • today at 12:45 AM

> In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75% of head-to-head matchups.

75% win rate seems pretty good!

Paper link: https://law.stanford.edu/wp-content/uploads/2026/06/salinas_...

➕ show 3 replies

king_zee • today at 12:38 AM

I think there will be a market for firms that aggressively market themselves as non-AI, and then as more people turn towards that human connection we'll go full circle

➕ show 3 replies

gamblor956 • today at 5:11 AM

While they provided the questions that professors and LLMs were asked to respond to, they don't include any of the answers from either the humans or the LLMs, so there's no way to independently verify that the LLMs actually returned "better" answers.

Given the number of responses the professors were asked to rate (200 each), they probably graded them the same way that bar exam responses are graded: quickly and superficially. Not surprising that LLMs achieved higher scores in this scenario, since they excel at producing superficially nice answers that don't hold up under scrutiny.

Also...unless statistics has changed in the past 2 decades, the math in the charts doesn't math. That's probably why they're leaving out the actual numerical data. I also wouldn't be surprised if we learn in the coming days that the charts were AI generated.

gaiagraphia • today at 1:13 AM

Incredible that the common people will be able to wrestle the right to rule of law away from the bloated legal caste, who have built themselves quite the moat.

The inaccessibility of justice is a huge driver of inequality. Any tools which bridge this gap will help make a more just society.

homeonthemtn • today at 12:59 AM

Personally I think this is very good. One of the hardest things out there is maintaining a society in the face of changing times and it's because law is dense and slow.

I think, in the right hands, this could be huge.

➕ show 1 reply

Thaxll • today at 1:12 AM

AI will never convince a jury though.

➕ show 1 reply

rimliu • today at 5:42 AM

Yes yes, the IPO is near.

t0lo • today at 1:32 AM

Library outperforms student... more news at 9

➕ show 2 replies

bko • today at 1:01 AM

Marc Andreessen argued that we've already reached AGI. He says that the top AI models give better answers than 99% of people he has access to, and he has access to some of the best people in their field.

I'm getting more convinced. I mean, sure it makes dumb mistakes sometimes but its a particular set of self serving mistakes, commenting out tests in order to pass. We obv don't want this behavior but I wouldn't say it's dumb.

It'll be like the Turing test, which we just blew past years ago and no one cared. After all the hand-wringing about sentience and rights of the AI if it passes the Turing test, and now we just have AI bots running 24/7 writing slop.

How does everyone else feel?

➕ show 7 replies

steele • today at 12:54 AM

in mice

t0lo • today at 1:36 AM

More great news from the prestigious university where 40% of students claim they are disabled

https://fortune.com/article/rise-in-elite-students-seeking-a...

and where they wanted to ban words such as "chief", "stupid", "karen" and "American"

https://reason.com/2022/12/21/stanford-elimination-harmful-l...

34981t • today at 12:30 AM

He is basically an AI professor for law. This study just confirms his existence:

https://juliannyarko.com/

Stanford and its donors of course want to replace anyone but its administrators, so they cheer on such anti-intellectual nonsense.

alt Hacker News

AI outperforms law professors in Stanford Law study

Comments