logoalt Hacker News

ACCount37yesterday at 9:43 PM0 repliesview on HN

That's sort of true. SFT isn't too expensive - the per-token cost isn't far off from that of pre-training, and the pre-training dataset is massive compared to any SFT data. Although the SFT data is much more expensive to obtain.

RL is more expensive than SFT, in general, but still worthwhile because it does things SFT doesn't.

Automated evaluation is massive too - benchmarks are used extensively, including ones where LLMs are judged by older "reference" LLMs.

Using AI feedback directly in training is something that's done increasingly often too, but it's a bit tricky to get it right, and results in a lot of weirdness if you get it wrong.