If it's code that you can tolerate being somewhat messy and suboptimal, you can run agents e2e. If it's critical piece of code that has become part of your identity, better do the PR work and scrutinize it well. LLMs are still next token predictors, no matter how much harness, hooks, skills and tools is attached to them. LLMs will only know that these are callable, interpretating the state and mitigation are still best effort.