IANAL, but this seems like an odd test to me. Judges do what their name implies - make judgment call...

codingdave • yesterday at 11:49 PM • 18 replies • view on HN

IANAL, but this seems like an odd test to me. Judges do what their name implies - make judgment calls. I find it re-assuring that judges get different answers under different scenarios, because it means they are listening and making judgment calls. If LLMs give only one answer, no matter what nuances are at play, that sounds like they are failing to judge and instead are diminishing the thought process down to black-and-white thinking.

Digging a bit deeper, the actual paper seems to agree: "For the sake of consistency, we define an “error” in the same way that Klerman and Spamann do in their original paper: a departure from the law. Such departures, however, may not always reflect true lawlessness. In particular, when the applicable doctrine is a standard, judges may be exercising the discretion the standard affords to reach a decision different from what a surface-level reading of the doctrine would suggest"

Replies

scottLobster • today at 12:28 AM

Yeah, I'm reminded of the various child porn cases where the "perpetrator" is a stupid teenager who took nude pics of themselves and sent them to their boy/girlfriend. Many of those cases have been struck down by judges because the letter of the law creates a non-sequitur where the teenager is somehow a felon child predator who solely preyed on themselves, and sending them to jail and forcing them to sign up for a sex offender registry would just ruin their lives while protecting nobody and wasting the state's resources.

I don't trust AI in its current form to make that sort of distinction. And sure you can say the laws should be written better, but so long as the laws are written by humans that will simply not be the case.

➕ show 10 replies

deepsun • today at 12:32 AM

The main job of a judicial system is to appear just to people. As long as people think it's just -- everyone is happy. But if it's strictly by the law, but people consider it's unjust -- revolutions happen.

In both cases, lawmakers must adapt the law to reflect what people think is "just". That's why there are jury duty in some countries -- to involve people to the ruling, so they see it's just.

➕ show 5 replies

6LLvveMx2koXfwn • today at 8:11 AM

> I find it re-assuring that judges get different answers under different scenarios

Unfortunately, as the aptly titled 'Noise' [1] demonstrated o so clearly, judges tend to make different judgement calls in the same scenarios at different times.

1. Noise - https://en.wikipedia.org/wiki/Noise:_A_Flaw_in_Human_Judgmen...

bawolff • today at 1:34 AM

> Judges do what their name implies - make judgment calls. I find it re-assuring that judges get different answers under different scenarios, because it means they are listening and making judgment calls.

I disagree - law should be the same for everyone. Yes sometimes crimes have mitigating curcumstances and those should be taken into account. However that seems like a separate question of what is and is not illegal.

➕ show 5 replies

swalsh • today at 12:08 AM

I believed that too until I watched the Karen Read Trials. The judge had a bias, and it was clear karen got justice despite the judge trying to put her finger on the scale.

snitty • today at 1:57 AM

So here the test was effectively given a set of relevant facts, can we influence the way a judge (or LLM) rules based on superfluous facts. The judges were either confused or swayed by the superfluous facts. The LLM was not. The matter was one where the outcome should have been determinative, not judgment-based, under US law.

vjulian • today at 12:48 AM

The legal system leaves much to be desired in relation to fairness and equity. I’d much prefer a multi-staged approach with an 1) AI analysis, 2) judge review with high bar for analysis if in disagreement with the AI, 3) public availability of the deliberations, 4) an appeals process.

➕ show 3 replies

vidarh • today at 9:23 AM

Even in that case, if these systems can be proven to be good enough, rules that require them to be consulted, and for the judge to justify the deviation (if any) from the automated reasoning, might be good.

To draw a parallel to a real system, in Norway a lot of cases are heard by panels of judges that include a majority (2 or 3 usually) lay judges and a minority (1 or 2 usually) of professional judges. The lay judges are people without legal training that effective function like a "mini jury", but unlike in a jury trial the lay judges deliberate with the professional judges.

The professional judges in this system has the power to override if the lay judges are blatantly ignoring the law, but this is generally considered a last resort. That power requires the lay judges to justify themselves if they intend on making a call the professional judges disagree with. Despite that, it is not unusual for the lay judges to come to a judgement that is different from what the professional judges do, and fairly rare for their choices to be overridden.

The end result is somewhere in the middle between a jury and "just" a judge. If proven - with far more extensive testing - that its reasoning is good enough, an LLM could serve a similar function of providing the assessment of what the law says about the specific case, and leave to humans to determine if and why a deviation is justified.

tylervigen • today at 12:09 AM

Yes, your view is commonly called "legal realism."

raw_anon_1111 • today at 4:28 AM

You have a lot more faith in judges not being biased than I do. I’m about to say something that really makes me throw up a little in my mouth because it harkens back to the forced banal DEI training I had to suffer through in 2020 at BigTech [1]…

But judges have all sorts of biases both conscious and unconscious. Where little Jacob will get in trouble for mischief and little Jerome will do the same thing and Jacob is just “a kid being a kid”. But little Jerome is “a thug in training who we need to protect society from”.

[1] yes I’m well aware that biases exist. Not only did my still living parents grow up in the Jim Crow South. We had a house built in an infamous what was a “sundown town” as recently as 1990.

We have seen how quickly the BS corporate concern was just marketing when it was convenient.

droidjj • today at 12:08 AM

Whether it’s reassuring depends on your judicial philosophy, which is partly why this is so interesting.

godelski • today at 3:32 AM

IANAL. One thing I like to say is

  There is no rule that can be written so precisely that there are no exceptions, including this one.

A joke[0], but one I think people should take seriously. Law would be easy if it weren't for all the edge cases. Most of the things in the world would be easy if it weren't for all the edge cases[1]. This can be seen just by contemplating whatever domain you feel you have achieved mastery over and have worked with for years. You likely don't actually feel you have achieved mastery because you're developed to the point where you know there is so much you don't know[2].

The reason I wouldn't want an LLM judge (or any algorithmic judge) is the same reason I despise bureaucracy. Bureaucracy fucks everything up because it makes the naive assumption that you can figure everything out from a spreadsheet. It is the equivalent of trying to plan a city from the view out of an airplane window. The perspective has some utility, but it is also disconnected from reality.

I'd also say that this feature of the world is part of what created us and made us the way we are. Humans are so successful because of our adaptability. If this wasn't a useful feature we'd have become far more robotic because it would be a much easier thing for biology to optimize. So when people say bureaucracies are dehumanizing, I take it quite literally. There's utility to it, but its utility leads to its overuse and the bias is clear that it is much harder to "de"-implement something than to implement it. We should strongly consider that bias in society when making large decisions like implementing algorithmic judges. I'm sure they can be helpful in the courtroom, but to abdicate our judgements to them only results in a dehumanized justice system. There are multiple literal interpretations of that claim too.

[0] You didn't look at my name, did you?

[1] https://news.ycombinator.com/item?id=43087779

[2] Hell, I have a PhD and I forget I'm an expert in my domain because there's just so much I don't know I continue to feel pretty dumb (which is also a driving force to continue learning).

fluidcruft • today at 1:06 AM

There are findings of fact (what happened, context) and findings of law (what does the law mean given the facts). I don't think inconsistentcy in findings of law is acceptable, really. If laws are bad fix the laws or have precident applied uniformly rather than have individual random judges invent new laws from the bench.

Sentencing is a different thing.

➕ show 2 replies

ralusek • today at 6:33 AM

Disagree completely. Judgement of the sort you're describing should be done at the legislative phase (i.e. writing code).

Inconsistent execution/application of the law is how bias happens. If a judgement done to the letter of the law feels unjust to you, change the letter of the law.

latchkey • today at 12:07 AM

In 30 seconds, did the entire corpus of all the legal cases since the dawn of time agree with the judges opinion on my case? For the state of things in AI today, I'll take it as a great second opinion.

➕ show 1 reply

homeonthemtn • today at 1:42 AM

I don't think a lot of people understand the grueling nature of a judge. Day in and out of cases over years are going to generate bias in the judge in one form or another. I wouldn't mind an AI check* to help them check that bias

*A magically thorough, secure, and well tested AI

gowld • today at 12:07 AM

A mistake isn't "judgment".

These were technical rulings on matters of jurisdiction, not subjective judgments on fairness.

"The consistency in legal compliance from GPT, irrespective of the selected forum, differs significantly from judges, who were more likely to follow the law under the rule than the standard (though not at a statistically significant level). The judges’ behavior in this experiment is consistent with the conventional wisdom that judges are generally more restrained by rules than they are by standards. Even when judges benefit from rules, however, they make errors while GPT does not.

qwertox • today at 12:30 AM

> If LLMs give only one answer, no matter what nuances are at play, that sounds like they are failing to judge and instead are diminishing the thought process down to black-and-white thinking.

You can have a team of agents exchange views and maybe the protocol would even allow for settling the cases automatically. The more agents you have, the higher the nuances.

➕ show 2 replies

alt Hacker News

Replies