logoalt Hacker News

xmddmxyesterday at 9:28 PM1 replyview on HN

The concept you need here is "Statistical Power".

The ELI5 version is that there are two mistakes you can make when looking at a P value:

Type I error, where your P value is falsely low. In the experiment being discussed here, it would lead one to conclude that AI code is worse. Otherwise known as a false positive.

Type II error, where your P value is falsely high, leading you to conclude that AI code is no different. Otherwise known as a false negative.

https://en.wikipedia.org/wiki/Power_(statistics)

One can calculate statistical power for a given experimental protocol.

My hunch is that if you did this, you would find this experiment is grossly under-powered.

This means you can't make the "absence of evidence" claim.


Replies

davrosthedalektoday at 12:50 AM

He can't make the evidence of absence claim, but he can absolutely make the absence of evidence claim.

show 1 reply