All that compute is not needed. We have an existence proof from biology in the form of natural intelligence that much greater efficiency is possible. However, achieving dramatic improvements in compute efficiency will depend on unpredictable scientific breakthroughs. Personally I suspect that an entirely new hardware architecture will be needed, although I don't have any hard evidence to back that up.
>from biology ... much greater efficiency is possible
Those are much more specialized models with pretty mediocre tokens per second.
>We have an existence proof from biology in the form of natural intelligence that much greater efficiency is possible.
It's only a proof that it's possible with 18+ years of training.