One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn...

vlovich123 • last Saturday at 1:34 AM • 1 reply • view on HN

One classic problem in all ML is ensuring the benchmark is representative and that the algorithm isn’t overfitting the benchmark.

This remains an open problem for LLMs - we don’t have true AGI benchmarks and the LLMs are frequently learning the benchmark problems without actually necessarily getting that much better in real world. Gemini 3 has been hailed precisely because it’s delivered huge gains across the board that aren’t overfitting to benchmarks.

Replies

ipaddr • last Saturday at 2:11 AM

This could be a solved problem. Come up with problems not online and compare. Later use LLMs to sort through your problems and classify between easy-difficult

alt Hacker News

Replies