> Language model capability at generating text output.
That's not a quantifiable sentence. Unless you put it in numbers, anyone can argue exponential/not.
> next gen models are significantly harder to build.
That's not how we judge capability progress though.
> Remind me what was so great about gpt 5? How about gpt4 from from gpt 3?
> Do you even remember the releases?
At gpt 3 level we could generate some reasonable code blocks / tiny features. (An example shown around at the time was "explain what this function does" for a "fib(n)") At gpt 4, we could build features and tiny apps. At gpt 5, you can often one-shot build whole apps from a vague description. The difference between them is massive for coding capabilities. Sorry, but if you can't remember that massive change... why are you making claims about the progress in capabilities?
> Multimodal add ons that no one asked for
Not only does multimodal input training improve the model overall, it's useful for (for example) feeding back screenshots during development.