logoalt Hacker News

BoorishBearstoday at 8:07 AM1 replyview on HN

Lots of people doing the same with extra steps (generating synthetic data from test questions with the LLM then training on it)

I wish we'd move past public test sets for LLM benchmarks: publish a plain english explanation of the tasks, allow questions and clarifications, and but never release a single question from the test set verbatim.

It made sense back when models needed to be finetuned on the task to even reliably answer. If we're saying this is the path to AGI we should be able to rely on the generalization of the model to get it right.


Replies

ting0today at 8:22 AM

You have a problem with generating synthetic data from test questions? Humans simulate experiences in their mind. What's the problem?

show 1 reply