I think there's probably a distinction to be made between deliberate, careful use of synthetic data, as opposed to blindly scraping 1PB of LLM generated SEO spam and force-feeding it into a new model. Maybe the former is useful, but the latter... probably not.
I think there's probably a distinction to be made between deliberate, careful use of synthetic data, as opposed to blindly scraping 1PB of LLM generated SEO spam and force-feeding it into a new model. Maybe the former is useful, but the latter... probably not.