I get that. But, let’s say you have a glass, you fill it to one third, then to half, then to three quarter, then to full. Can you expect to fill it beyond full? Not every process has an infinite ramp.
It seems frontier labs have been throwing all the compute and all the data they could get their hands on at model training for at least the past 2 years. Is that glass a third full or is it nearly full already?
Is the process of filling that particular glass linear or does the top 20% of the glass require X times as much water to fill as the bottom 20%?
I don’t see how that analogy makes any sense. We’re not talking about containers of a known and fixed size here, nor a single technique, nor a single method. Stuff like LLMs using Transformer architectures might have reached a plateau, for instance. But there’s tons of techniques _around_ those models that keep making them more capable (o1, etc), and also other architectures.