Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarka...

paytonjjones • today at 6:34 PM • 2 replies • view on HN

Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.

Replies

root_axis • today at 7:37 PM

The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.

Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.

➕ show 1 reply

codemog • today at 7:08 PM

They already tried that with GPT-4 and GPT-4.5

They were allegedly massive but the cost and returns were not worth it.

alt Hacker News

Replies