I think most labs actively create synthetic data using existing model as part of the mix for the pre...

meander_water • today at 9:46 AM • 1 reply • view on HN

I think most labs actively create synthetic data using existing model as part of the mix for the pretraining stage for their next model.

Would love to know exactly what the latest process is to keep slop out of training data.

martinald • today at 11:17 AM

const isAiContent = (str) => str.includes('—');?

➕ show 1 reply

alt Hacker News