The difference is a lot more than just throwing scale at it, pretty much everything useful co...

root_axis • today at 7:37 PM • 1 reply • view on HN

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

Replies

darth_avocado • today at 8:49 PM

Correct. That is what I was trying to hint at. Yes, massive compute is needed to train ai, but it isn’t the only thing. A lot of research and experimentation goes into moving the marker just a little bit. Innovation can’t be forced into weekly sprints, it takes its own time.

➕ show 1 reply

alt Hacker News

Replies