While I agree that a AI system is not just the LLM, for me, the problem is that LLM alone (the one from years ago, which were basically stateless LLM) are already too convincingly looking like real human conversation at first sight.
It shows that the LLM part found ways to mimic human conversation with a mechanism that is not the same as a typical biological brain. Then, you can push the AI system on adding things on top, but it is too late: these things on top will have no incentive to recreate from scratch the mechanism. The LLM pushed the system into a local minimum, and the rest of the system will not "go into a dis-optimising direction and restart from scratch".