logoalt Hacker News

aomix01/21/20250 repliesview on HN

They use other models to judge correct-ness and when possible just ask the model output something that can be directly verified. Like math equations that can be checked 1:1 against the correct answer.