Offhand, do you know what format that data is in? Is it a question and then a human answering that q...

trothamel • yesterday at 3:39 PM • 1 reply • view on HN

Offhand, do you know what format that data is in? Is it a question and then a human answering that question? Mostly just curious at to what the training data consists of.

Replies

jmalicki • yesterday at 3:55 PM

The most advanced training data is in the form of rubrics as rewards.

A human asks a question, then writes rubrics to judge the LLMs response, so rather than evaluating a specific response, those rubrics can live on as the LLM evolves and gives different answers. There are more complex variants as well, but that's the basic principle.

https://arxiv.org/abs/2507.17746

alt Hacker News

Replies