logoalt Hacker News

SubiculumCodetoday at 4:51 PM2 repliesview on HN

No. Most data generated this way is poor quality. It's not the user responses and or queries. If the user does not know better than the LLM, you can generate bad responses. The value is in taking a superior model, submitting a query, and getting a higher quality output than you yourself could have generated, and using that to boost your model.


Replies

cedwstoday at 7:14 PM

AI companies have been using synthetic data for ages now. The data doesn't need to yield new insights to be useful for training.

Tostinotoday at 5:16 PM

You identify users doing real work and implementing a project over a long period of time and train on their traces.