The paper says the professors have a median of 200 comparisons each. It also says they only used 2 ...

ALittleLight • today at 3:47 AM • 1 reply • view on HN

The paper says the professors have a median of 200 comparisons each. It also says they only used 2 models because using more models would require more comparisons and they selected Google models because Google was branded/advertised as being education focused. When you see other models show up elsewhere, that's because they extended the main idea to other models but using LLMs to judge instead of human professors.

Replies

godelski • today at 3:50 AM

Sure, but the biggest problem is they have no statistical significance. Variance is too high. How do you distinguish the signal from the noise? Confidence intervals aren't enough.

But is it a surprise law professors aren't great statisticians?

➕ show 1 reply

alt Hacker News

Replies