logoalt Hacker News

riku_ikilast Thursday at 6:54 PM0 repliesview on HN

> I'm pretty skeptical of that 75% on GPQA Diamond for a non-reasoning model.

could that benchmark be simply leaked to training data as many others?