logoalt Hacker News

great_psyyesterday at 9:37 PM1 replyview on HN

How do you measure quality at scale ? Is there another model that determines if it adheres to codebase standard ?


Replies

swyxyesterday at 9:46 PM

see Beyond Unit Tests and Novel Grading Methods in TFA.

i think something like ~60% llm as judge rubrics and the rest as described. every rubric validated by maintainer. 3000 rubrics