logoalt Hacker News

thevinteryesterday at 8:42 PM1 replyview on HN

Are you intentionally keeping the benchmarks private?


Replies

XCSmeyesterday at 8:52 PM

Yes.

I am trying to think what's the best way to give most information about how the AI models fail, without revealing information that can help them overfit on those specific tests.

I am planning to add some extra LLM calls, to summarize the failure reason, without revealing the test.