You have a problem with generating synthetic data from test questions? Humans simulate experiences i...

ting0 • today at 8:22 AM • 1 reply • view on HN

You have a problem with generating synthetic data from test questions? Humans simulate experiences in their mind. What's the problem?

Replies

BoorishBears • today at 8:37 AM

Models don't generalize as well as humans.

Synthetic data is fine. Synthetic data on very similar questions generated based on the description is typically fine. But once the shape of what you're training on gets too close to the actual holdout questions, you're getting an uplift that's not realistic for unseen tasks.

alt Hacker News

Replies