One thing that makes LLMs complicated in production is that they're stateless — every call starts from zero. The complexity compounds when you need agents to maintain context across sessions and models. That's a layer that's largely missing from most stacks today.
That does not seem to be related to llms? It is more about the harness that utilizes them, right?
If you think statefull LLMs would be easier to handle then stateless... Then I think you haven't done a lot of software engineering