Interesting. I am wondering why would anyone use a t-test when the experiment is clearly modelled ...

331c8c71 • yesterday at 8:01 AM • 2 replies • view on HN

Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).

Replies

jampekka • yesterday at 8:28 AM

The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.

➕ show 1 reply

plewd • yesterday at 8:17 AM

I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?

➕ show 3 replies

alt Hacker News

Replies