We used two types of datasets for post-training. Supervised finetuning data and preference data used...

sangwulee • 07/31/2025 • 1 reply • view on HN

We used two types of datasets for post-training. Supervised finetuning data and preference data used for RLHF stage. You can actually use less than < 1M samples to significantly boost the aesthetics. Quality matters A LOT. Quantity helps with generalisation and stability of the checkpoints though.

Replies

lawlessone • 08/01/2025

How is the data collected?

➕ show 1 reply

alt Hacker News

Replies