"In this post, I’ll cover a third, not-so-obvious approach: building ways for the agent to validate more of its own work before a human has to step in. "
this has been an obvious thing to do since at least January (since Geoffrey Huntley published "everything is a ralph loop"), and this is how I've been working: build enough orchestration tooling to be able to automate everything: development container bringup, building it, running the unit tests, doing integration testing, and using the software as eventually an end user. then to iterate set performance goals on an already solid basis so the automated agent ("gym") can go and iterate autonomously, and let you know when it's "done".
I understand this probably does not work if you're on some subscription and not using the API (tokens burn fast), but this has been extremely productive for me.
This is where most of my productivity gains have come, I have a special harness I move from project to project now that does my testing orchestration, lots of my work day is setting up a prompt or two early and just letting them loop till they return evidence that the feature is working having gone through the big QA loop.
I've slowly been optimizing for token use through the stack and Claude ends up making very tight for loops for most of the process and keeping token count even lower. It's been nice. A lot of my toil at work is just gone.
You can get really far with the 20x Claude Code and Codex plans. They are many orders of magnitude cheaper than api calls.