We've had self-improving AIs before, and they tended to get lost after a while. That's going to be a problem. LLMs are stable because they return to a ground state with no history for a new job. Systems with persistent state have a problem with that state not being sane. Remember Microsoft's 2016 chatbot that learned from Twitter? [1]
[1] https://spectrum.ieee.org/in-2016-microsofts-racist-chatbot-...
You can retrain a model and have a ground state as reference, it's not trivial but Microsoft's attempt was 10 years ago and significantly less complex than what's being built now.
Interesting, what are some other self-improving AI implementations? Any that actually achieved interesting results? Obviously continuous training has been tried before, but I've never heard of anything that could turn around and actually contribute code toward its own next-generation version.
You might be interested in this graph, [1] which suggests that the amount of time that AI's can run on their own has been increasing. Perhaps it will hit diminishing returns, but that seems difficult to predict.
[1] https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...