logoalt Hacker News

zackangelolast Wednesday at 2:54 PM1 replyview on HN

There might be a plateau coming but I’m not sure that will be the reason.

It seems counterintuitive but there is some research suggesting that using synthetic data might actually be productive.


Replies

jsheardlast Wednesday at 2:59 PM

I think there's probably a distinction to be made between deliberate, careful use of synthetic data, as opposed to blindly scraping 1PB of LLM generated SEO spam and force-feeding it into a new model. Maybe the former is useful, but the latter... probably not.