logoalt Hacker News

paytonjjonestoday at 6:34 PM2 repliesview on HN

Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.


Replies

root_axistoday at 7:37 PM

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

show 1 reply
codemogtoday at 7:08 PM

They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.