Coding agents and much better models. Claude Code or Codex CLI plus Claude Opus 4.5 or GPT 5.2 Codex.
The latest models and harnesses can crunch on difficult problems for hours at a time and get to working solutions. Nothing could do that back in ~March.
I shared some examples in this comment: https://news.ycombinator.com/item?id=46436885
Cool, but most developers do mundane stuff like glueing APIs and implementing business logic, which require oversight and review.
Those crunching hard problems will still review what's produced in search of issues.
I was going back and looking at timelines, and was shocked to realize that Claude Code and Cursor's default-to-agentic-mode changes both came out in late February. Essentially the entire history of "mainstream" agentic coding is ten months old.
(This helps me understand better the people who are confused/annoyed/dismissive about it, because I remember how dismissive people were about Node, about Docker, about Postgres, about Linux when those things were new too. So many arguments where people would passionately talk about all those things were irredeemably stupid and only suitable for toy/hobby projects.)
Are there techniques though? Tech pairing? Something we know now that we didn't then? Or just better models?
Ok I will bite.
Every single example you gave is in a hobby project territory. Relatively self-contained, maintainable by 3-4 devs max, within 1k-10k lines of code. I've been successfully using coding agents to create such projects for the past year and it's great, I love it.
However, lots of us here work on codebases that are 100x, 1000x the size of these projects you and Karpathy are talking about. Years of domain specific code. From personal experience, coding agents simply don't work at that scale the same way they do for hobby projects. Over the past year or two, I did not see any significant improvement from any of the newest models.
Building a slightly bigger hobby project is not even close to making these agents work at industrial scale.