logoalt Hacker News

throwaw12today at 4:51 PM3 repliesview on HN

how do you know it's benchmaxxed?


Replies

solenoid0937today at 5:04 PM

Friends at Meta with access to the model + personal experience at Meta.

Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.

lumatoday at 5:29 PM

For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.

prodigycorptoday at 5:09 PM

meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.

show 1 reply