logoalt Hacker News

jabedudeyesterday at 6:08 PM1 replyview on HN

But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context


Replies

plagiaristyesterday at 8:21 PM

Might as well have the free tokens, then, especially if it is an open benchmark they are already aware of. If they want to game it they cannot be stopped from doing so when it's on their infra.