Claude code and other AI coding tools must have a * mandatory * hook for verification.
For front end - the verification is make sure that the UI looks expected (can be verified by an image model) and clicking certain buttons results in certain things (can be verified by chatgpt agent but its not public ig).
For back end it will involve firing API requests one by one and verifying the results.
To make this easier, we need to somehow give an environment for claude or whatever agent to run these verifications on and this is the gap that is missing. Claude Code, Codex should now start shipping verification environments that make it easy for them to verify frontend and backend tasks and I think antigravity already helps a bit here.
------
The thing about backend verification is that it is different in different companies and requires a custom implementation that can't easily be shared across companies. Each company has its own way to deploy stuff.
Imagine a concrete task like creating a new service that reads from a data stream, runs transformations, puts it in another data stream where another new service consumes the transformed data and puts it into an AWS database like Aurora.
``` stream -> service (transforms) -> stream -> service -> Aurora ```
To one shot this with claude code, it must know everything about the company
- how does one consume streams in the company? Schema registry?
- how does one create a new service and register dependencies? how does one deploy it to test environment and production?
- how does one even create an Aurora DB? request approvals and IAM roles etc?
My question is: what would it take for Claude Code to one shot this? At the code level it is not too hard and it can fit in context window easily but the * main * problem is the fragmented processes in creating the infra and operations behind it which is human based now (and need not be!).
-----
My prediction is that companies will make something like a new "agent" environment where all these processes (that used to require a human) can be done by an agent without human intervention.
I'm thinking of other solutions here, but if anyone can figure it out, please tell!