logoalt Hacker News

kqryesterday at 4:50 PM1 replyview on HN

It would unfortunately also need several runs of each to be reliable. There's nothing in TFA to indicate the results shown aren't to a large degree affected by random chance!

(I do think from personal benchmarks that Gemini 3 is better for the reasons stated by the author, but a single run from each is not strong evidence.)


Replies

casey2yesterday at 4:58 PM

TFA says multiple times that the results are affect by random chance

show 1 reply