logoalt Hacker News

bluehark07/31/20251 replyview on HN

How large was the dataset used for post-training?


Replies

sangwulee07/31/2025

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.

show 1 reply