logoalt Hacker News

331c8c71yesterday at 8:01 AM2 repliesview on HN

Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).


Replies

jampekkayesterday at 8:28 AM

The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.

show 1 reply
plewdyesterday at 8:17 AM

I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?

show 3 replies