Maybe not AGI, but if you look at the differences between, say, GPT-2 and GPT 5.5, it's remarkable how well it works to mostly just throw scale at the problem.
They already tried that with GPT-4 and GPT-4.5
They were allegedly massive but the cost and returns were not worth it.
The difference is a lot more than just throwing scale at it, pretty much everything useful comes from an evolving landscape of post-training techniques.
Of course, param count and context length are also important because they increase the model's overall fidelity, but a base model without SFT, RHLF etc is effectively useless.