logoalt Hacker News

lawlessone08/01/20251 replyview on HN

How is the data collected?


Replies

sangwulee08/01/2025

The highest quality finetuning data was hand curated internally. I would say our post training pipeline is quite similar to SeedDream 2.0 ~ 3.0 series from ByteDance. Similar to them, we use extensive quality filters and internal models to get the highest quality possible. Even from there, we still hand curate a hand-picked subset.

show 1 reply