Sure, but the biggest problem is they have no statistical significance. Variance is too high. How do you distinguish the signal from the noise? Confidence intervals aren't enough.
But is it a surprise law professors aren't great statisticians?
I disagree. 16 isn't necessarily the relevant N here but the number of responses is.
If you have 100 responses from 1 professor, and the AI wins 75% of the time that is very likely a true signal that the AI is better than this prof. It would be incorrect to generalize this to all profs though.
Further, if you sample 16 profs and the AI beats 10 of them you can be fairly certain that the real percentage of profs it beats isn't 10%. Further, when estimating the probability that the AI beats a random prof, it's the relative estimation error that scales with 1/sqrt N. If you have a coin and it lands heads up 16 times, that tells you something quite robust about the coin.
Reasonably estimating confidence intervals at small N and high p is not trivial. But it can be done.
A good heuristic is "add 2 successes and 2 failures" which is due to Agresti & Couli.
I disagree. 16 isn't necessarily the relevant N here but the number of responses is.
If you have 100 responses from 1 professor, and the AI wins 75% of the time that is very likely a true signal that the AI is better than this prof. It would be incorrect to generalize this to all profs though.
Further, if you sample 16 profs and the AI beats 10 of them you can be fairly certain that the real percentage of profs it beats isn't 10%. Further, when estimating the probability that the AI beats a random prof, it's the relative estimation error that scales with 1/sqrt N. If you have a coin and it lands heads up 16 times, that tells you something quite robust about the coin.
Reasonably estimating confidence intervals at small N and high p is not trivial. But it can be done.
A good heuristic is "add 2 successes and 2 failures" which is due to Agresti & Couli.
See down the page here for source papers:
https://en.wikipedia.org/wiki/Binomial_proportion_confidence...