2025 xmas day, was at my wife's parents' house in rural Japan, my kids were all playing wi...

veidr • today at 2:02 AM • 0 replies • view on HN

2025 xmas day, was at my wife's parents' house in rural Japan, my kids were all playing with their cousins, I was posted up with my laptop just listening to some podcast about the benefits of making time for long walks in middle age (as if! ~lol) while running another "agentic team" experiment — 12 agents in parallel.

I'd been feeding these bots a few projects, over and over — the hard part was the feeding them — that is, giving them enough well-defined work to do. They weren't yet good enough to write real software you could keep — at least I'd never seen that — and my experiments were just about finding the edges, building my intuition, and playing with processes that might be useful someday.

These things had built my kids' weird magical-dominoes games a few times by that point — but the experiment had been repeated so many times that you could argue we had "written" that software in English, with a spec that had been built, reworked, and rebuilt many times.

But this time, the bots were building me a bespoke git client, unlike any other, and unlike anything I would take the time to write — waaaay to complicated, with too little benefit. I wanted it, but only for this one niche use case.

It was a GUI client to manage a collection of repos, about 200 of them in a monorepo where every subproject was a git submodule , which are the universal counterpart to node_modules — while the latter is notorious for being "the heaviest object in the universe", git submodules are widely acknowledged to be the most annoying objects in the universe.

Nevertheless, I had this weird monorepo, and I wanted to visualize and do stuff to this list of independent repos that were also git submodules of the parent monorepo: sort by outstanding commits, divergence from upstream, recency of activity, etc. Visualize them differently based on these things. Search across them, including the source code on branches other than the current one. Show the branch counts and number of branches and commits that existed locally but not pushed upstream. A bunch more boring stuff like that, but done across the full set of repos.

That project itself wasn't even interesting to me; that software would be marginally useful to me if it existed and worked, but the main point it was just a large enough chunk of work to keep a team of bots busy all day without a human in the loop.

In December 2025, AI coding agents were already useful with a human in the loop. Opinions varied a lot about how useful they were, but to me it was obvious we were going to use them for the rest of our careers as software engineers.

It was not yet obvious that we were going to let them write huge swaths of code, or entire programs, without any humans in the loop. I had never seen that produce something that worked well enough to be worth keeping.

And then, that day, I did. I had structured the workflow so that the git client was on the screen and auto-refreshing. I was listening to the podcast, drinking coffee, reading the news. The git client was a crude window with a table in the background, a single column showing the full path to each repo, and nothing else.

Then the table expanded. It got color coded numbers representing the commit/branch counts. It suddenly gained styles, and looked nice. A contextual menu started popping up, repeatedly, and grew to include several more menu items over the next few minutes. New confirmation dialogs popped up as the bots implemented and exercised the various features from my spec.

I remember my field of vision narrowing as I started to focus on what the bots were doing. They were just executing my loop — one bot would implement one bullet from my spec, another bot would review the code while another bot manually tested it, and tried to break it, run a code review gauntlet in a loop until there were no more findings, repeat.

I could see the progress play out on my screen as they worked. I had watched bot teams work before, but it had always been pretty janky, and something like a bad game that nobody would play, or a stupid to-do-list app, or — more often — something that didn't actually work.

This was the first time I had ever seen it work. This was the grail we'd been looking for, not sure if it really existed: a fleet of bots successfully building a piece of complex, useful software without human assistance. I could tell it was working, because the adversarial testing and usability checks were all happening right before my eyes.

So it _is_ possible, I thought to myself.

They did it all morning. The app worked. I used it every day after that, for several weeks, until I finally got that entire monorepo converted to a more sensible git subtree-based arrangement.

In the half year since then I've been in a kind of manic state some of my friends call cyberpsychosis, chasing that dream. I've now seen agentic fleets successfully build many things. I've also seen a bunch of failures, some subtle, some catastrophic and hilarious. I'm still building my intuition, and the laws of physics in this universe are mutating every few weeks. It's wild.

I am fortunate enough to work at a place that doesn't pressure engineers to climb a token leaderboard, or to use AI beyond what we deem prudent. This kind of agentic no-humans-in-the-loop coding is prohibited. The policy is that in this era where we all generate more code than ever, even by hand, it's the quality bar that must go up, not the speed of production.

That's awesome because it keeps me grounded in the old ways, and confines my cyberpsychosis to my weekends and evenings. I usually spend the weekend building up a couple software plans, honing them as best I can, and then unleashing the clankers Sunday night.

I'll let them run all week, sometimes giving them a poke or flipping them over a couple time in the evening, and then the next Saturday morning, I see what I've got. What I'm mainly interested in is: How can agentic fleet-coding processes evolve to produce better software and require less human interaction and inspection? And the corollary: How can software architectures evolve to safely consume more of this fundamentally untrustable code?

It's thrilling. Exhilarating. The near-infinite subsidized tokens are about to finally run out this month, alas. But for the past 6 months it's easily the best $400/month I have ever spent. :)

alt Hacker News