I've talked to a team that's doing the dark factory pattern hinted at here. It was fascinating. The key characteristics:
- Nobody reviews AI-produced code, ever. They don't even look at it.
- The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling and simulating related systems and running demos.
- The role of the humans is to design that system - to find new patterns that can help the agents work more effectively and demonstrate that the software they are building is robust and effective.
It was a tiny team and they stuff they had built in just a few months looked very convincing to me. Some of them had 20+ years of experience as software developers working on systems with high reliability requirements, so they were not approaching this from a naive perspective.
I'm hoping they come out of stealth soon because I can't really share more details than this.
We're going to need to become a lot more creative about what and how we test if we're ever to reach dark factory levels. Unit tests and integration tests are one thing, but truly testing against everything in a typical project requirements document is another thing.
What is the AI analog for Tesla's level of robotaxi, where there's a "safety monitor" in the passenger seat or sans safety monitor there's a trailing guide car[1] and remote driver in Mumbai[2]?
[1] https://electrek.co/2026/01/22/tesla-didnt-remove-the-robota...
[2] https://insideevs.com/news/760863/tesla-hiring-humans-to-con...
Having actually run some of the software produced by nearly "dark software factories," a lot of that software is completely shit.
Yegge's Beads is a genuinely good design, for example, but it's flakier and more broken the Unix vendor Motif implementations in 1993, and it eats itself more often than Windows 98 would blue screen.
I can actually run a bunch of orchestrated agents, and get code which isn't complete shit. But it's an extremely skill-intensive process, because I'm acting as product manager, lead engineer, and the backstop for the holes in the cognition of a bunch of different Claudes.
So far, the people promising completely dark software factories are either high on their own supply, or grifting to sell books (or occasionally crypto). Or so I judge from using the programs they ship.
The autopilot analogy is good because level 4-5 are essentially vaporware outside of success in controlled environments backed by massive investment and engineering.
One of other authors he links to[0] brags that he's released 10 projects in the past month, like "Super Xtreme Mapper, a high-end, professional MIDI mapping software for professional DJs", which has 4 stars on Github. Despite the "high-end, professional...for professional" description, literally no one is going to use it, because this guy can't [be trusted to] maintain this software. Even if Claude Code is doing all the work, adding all the features, and fixing all the bugs, someone has to issue the command to do that work, and to foot the bill. This guy is just spraying code around and snorting digital coke.
There is plausibly something here with AI-generated code but as always, the value is not in the first release but in the years of maintenance and maturation that makes it something you can use and invest in. The problem with AI is that it's giving these people hyper-ADHD, they can't commit to anything, and no one will use vibe-coded tools--I'm betting not even themselves after a month.
[0] https://nraford7.github.io/road-runner-economy/