logoalt Hacker News

root_axistoday at 7:37 PM1 replyview on HN

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.


Replies

darth_avocadotoday at 8:49 PM

Correct. That is what I was trying to hint at. Yes, massive compute is needed to train ai, but it isn’t the only thing. A lot of research and experimentation goes into moving the marker just a little bit. Innovation can’t be forced into weekly sprints, it takes its own time.

show 1 reply