Is that really true, though?
First off, you’re ignoring error bars. On average, frontier models might be 99.95% accurate. But for many work streams, there are surely tail cases where a series of questions only produce 99% accuracy (or even less), even in the frontier model case.
The challenge that businesses face is how to integrate these fallible models into reliable and repeatable business processes. That doesn’t sound so different than software engineering of yesteryear.
I suspect that as AI hype continues to level-off, business leaders will come to their senses and realize that it’s more marginally productive to spend on integration practices than squeaking out minor gains on frontier models.