Makes me wonder, as people grow to trust the AI more and more, not reading the code and barely skimming the implementation plans and simply rerolling if something doesn't work, will the value of these chats erode? Thinking back 1-1.5 years I was closely monitoring what these agents did and steering them quite aggressively. These days not so much. Where will RL signals come from when it approaches humans capabilities ever closer? How well does self play work for coding work? What about multistep tasks where it isn't just about being good at a single task, but evolving a codebase over time in the face of changing requirements?
Not sure, but in my experience, instead of asking for code, i'm asking for solutions and providing a kubectl configured to reach my cluster and az monitor command to read the logs and telemetry.
A typical session is the agent establishing a metrics and log baseline, creating the code, compiling, deploying, observing, fixing, redeploying, observing metrics, determining the outcome and commiting.
I really, really, don't look at the code anymore.
UPDATE:
so my point is: it won't have my stewarding the code anymore, but it will have the infrastructure (and ultimately the real world) providing feedback on the traces.
The only reason I still read the output at my day job is because I still need to send it to another human for review, and I'd be embarrassed and ashamed if I let some slop through. For my hobby projects.. there are definitely parts I don't know how they work.
Maybe we need some form of long-term training. How long does the code that the AI wrote stick around before being rewritten.
I guess we can do this retroactively too if we could somehow tag AI-written lines of code in the VCS, then in a couple years we can check which parts lasted.
Over a large sample size, simply getting feedback of "Did this work for me, y/n" is valuable even if the specific details are missing and even if the overall tasks are complicated and multifaceted.