logoalt Hacker News

vanuatuyesterday at 10:05 PM0 repliesview on HN

all the labs "clean" their pretraining data, and you can have your pretraining data to be minimally ai generated but also spam synthetic post-training data