logoalt Hacker News

CuriouslyCyesterday at 10:55 AM0 repliesview on HN

This is the thing that kills me about SFT. It was sensible when most of the compute in a model was in pretraining and the RL was mostly for question answering. Now that RL is driving model capabilities it doesn't make much sense.

On the other hand, RL on deployed systems looks promising to essentially JIT optimize models. Experiments with model routers and agentic rag have shown good results.