logoalt Hacker News

jmalickiyesterday at 5:37 PM1 replyview on HN

That is a great example of the kind of thing they're paying people to create as training data.

You write the prompt, and then write rubrics to judge the responses, and you found something the model failed at. Congratulations, you just earned $500, now do it again.


Replies

macleginnyesterday at 8:28 PM

Not the worst way to make money, but if internet-scale data were not enough to reduce errors to a somewhat tolerable margin, how much data do they hope to collect in this manner?

show 1 reply