I think a lot of use have implemented our own ad hoc self-improvement checks into our agentic workflows. My observations are the same as yours.