The article very much resonates with my experience past several months.
The project I work on has been steadily growing for years, but the amount of engineers taking care of it stayed same or even declined a bit. Most of features are isolated and left untouched for months unless something comes up.
So far, I managed growing scope by relying on tests more and more. Then I switched to exclusively developing against a simulator. Checking changes with real system become rare and more involved - when you have to check, it's usually the gnarliest parts.
Last year's, I noticed I can no longer answer questions about several features because despite working on those for a couple of months and reviewing PRs, I barely hold the details in my head soon afterwards. And this all even before coding agents penetrated deep into our process.
With agents, I noticed exactly what article talks about. Reviewing PR feels even more implicit, I have to exert deliberate effort because tacit knowledge of context didn't form yet and you have to review more than before - the stuff goes into one ear and out of another. My team mates report similar experience.
Currently, we are trying various approaches to deal with that, it it's still too early to tell. We now commit agent plans alongside code to maybe not lose insights gained during development. Tasks with vague requirements we'd implicitly understand most of previously are now a bottleneck because when you type requirements to an agent for planning immediately surface various issues you'd think of during backlog grooming. Skill MDs are often tacit knowledge dumps we previously kept distributed in less formal ways. Agents are forcing us to up our process game and discipline, real people benefit from that too. As article mentioned, I am looking forward to tools picking some of that slack.
One other thing that surprised me was that my eng manager was seemingly oblivious to my ongoing complains about growing cognitive load and confusion rate. It's as if the concept was alien to them or they could comprehend that other people handle that at different capacity than them.
My team has experienced this over the past 6 months for sure.
The core of the article is “ AI-assisted development potentially short-circuits this replenishment mechanism. If new engineers can generate working modifications without developing deep comprehension, they never form the tacit knowledge that would traditionally accumulate. The organization loses knowledge not just through attrition but through insufficient formation.”
But is it possible this phenomenon is transient?
Isn’t part of the presumed value add of LLM coding agents in the meta-realm around coding; e.g. that well-structured human+LLM generated code (green field in particular) will be organized in such a way that the human will not have to develop deep comprehension until needed (e.g. for bug fix/optimization) and then only for a working set of the code, with the LLM bringing the person up to speed on the working set in question and also providing the architectural context to frame the working set properly?
The whole premise of the post, that coders remember what and why they wrote things from 6 months ago, is flawed.
We've always had the problem that understanding while writing code is easier than understanding code you've written. This is why, in the pre-AI era, Joel Spolsky wrote: "It's harder to read code than to write it."
Not to disagree with anything the article talks about but to add some perspective...
The complaint about "code nobody understands" because of accumulating cognitive debt also happened with hand-written code. E.g. some stories:
- from https://devblogs.microsoft.com/oldnewthing/20121218-00/?p=58... : >Two of us tried to debug the program to figure out what was going on, but given that this was code written several years earlier by an outside company, and that nobody at Microsoft ever understood how the code worked (much less still understood it), and that most of the code was completely uncommented, we simply couldn’t figure out why the collision detector was not working. Heck, we couldn’t even find the collision detector! We had several million lines of code still to port, so we couldn’t afford to spend days studying the code trying to figure out what obscure floating point rounding error was causing collision detection to fail. We just made the executive decision right there to drop Pinball from the product.
- and another about the Oracle RDBMS codebase from https://news.ycombinator.com/item?id=18442941
(That hn thread is big and there are more top-level comments that talk about other ball-of-spaghetti projects besides Oracle.)
I think we might as well just go all in at this point: "LGTM, LLM". The industry always overshoots and then self-corrects later. Therefore, maybe the right thing to do is help it get to a more sane equilibrium is to forget about the code altogether and focus on other ways to constrain it / ensure correctness and/or determine better ways to know when comprehension is needed vs optional.
What I don't like is the impossible middle ground where people are asked to 20X their output while taking full responsibility for 100% of the code at the same time. That is the kind of magical thinking that I am certain the market will eventually delete. You have to either give up on comprehension or accept a modest, 20% productivity boost at best.
Richard Gabriel wrote a famous essay Worse Is Better (https://www.dreamsongs.com/WorseIsBetter.html). The MIT approach vs the New Jersey approach does not necessarily apply to the discussion of the merits of coding agent, but the essay's philosophy seems relevant. AI coding sometimes sacrifices correctness or cleanness for simplicity, but it will win and win big as long as the produced code works per its users' standards.
Also, the essay notes that once a "worse" system is established, it can be incrementally improved. Following that argument, we can say that as long as the AI code runs, it creates a footprint. Once the software has users and VC funding, developers can go back and incrementally improve or refactor the AI's mess, to a satisfying degree.
> The engineer who pauses to deeply understand what they built falls behind in velocity metrics.
This is the most insidious part. It's not even that bad code gets deployed. That can be fixed and hopefully (by definition) the market weeds that out.
The problem is that the market doesn't seem to operate like that, and instead the engineer who cares loses their job because they're not hitting the metrics.
It used to take years, decades, or centuries before a system could grow and evolve to be so complex and unwieldy, and so full of internal contradictions, that the whole thing becomes an incomprehensible tangle of hairballs. An example is the patchwork system of international, national, regional, and local laws we have at present, which has grown and evolved over centuries.
Now, it can take only a few days or weeks.
You might lose context of a specific project over time, but not of the language itself. When you're no longer involved with the project's implementation or the programming language itself, what remains?
> When an engineer writes code manually, two parallel processes occur. The first is production: characters appear in files, tests get written, systems change. The second is absorption: mental models form, edge cases become intuitive, architectural relationships solidify into understanding.
That absorption only takes place in the mind of that individual, unfortunately. That doesn't help when they no longer work there or are on vacation.
The ideal situation is the solo open source project. You wrote all 200K lines of code yourself, and will maintain them until death. :)
The organizational memory and on-call debugging sections allude to this, but there are significant effects on other parts of the organization. For example, if I work in product support and a customers asks about a products behavior - it becomes much more challenging to find answers if documentation is sparse (or ai written), engineers don’t immediately know the basics of the code they wrote, etc. Even if documentation is great and engineers can discuss their code, the pace of shipping updates can be a huge challenge for other teams to keep up with.
Great article. I agree with the argument.
But to offer a counter argument, would the same thing not have happened with the rise of high level languages? The machine code was abstracted away from engineers and they lost understanding of it, only knowing what the high level code is supposed to do. But that turned out fine. Would llms abstracting the code away so engineers only understand the functionality (specs, tests) also be fine for the same reason? Why didnt cognitive debt rise in with high level languages?
A counter counter argument is that compilers are deterministic so understanding the procedure of the high level language meant you understood the procedure that mattered of the machine code, and the stuff abstracted away wasnt necessary to the codes operation. But llms are probabilistic so understanding the functionality does not mean understanding the procedure of the code in the ways that matters. But id love to hear other peoples thoughts on that
Nothing new here, but the article is so well written and clear in how it presents the effects that it is a must-read.
One could argue with its stance, but I took it as a given (the equation for cognitive debt touches on science).
It feels entirely logical to view LLMs/coding agents as an almost final step in the short-term focus the overall system has been thriving on.
Very much feel this.
I wrote a SaaS project over the weekend. I was amazed at how fast Claude implemented features. 1 sentence turned into a TDD that looked right to me and features worked
but now 3 weeks later I only have the outlines of how it works and regaining the context on the system sounds painful
In projects I hand wrote I could probably still locate major files and recall system architectures after years being away
I have been in a big company for 4 years, and following the zillions of projets going on here and there, how they interact [nicely or not] has become a job in itself.
Very disturbing as I thought my technical skills would help me clarify the global picture. And that is exactly the contrary that is happening.
> The second is absorption: mental models form, edge cases become intuitive, architectural relationships solidify into understanding. ... . The friction of implementation creates space for reasoning.
> This gap between output velocity and comprehension velocity is cognitive debt.
I have felt that lack of absorption during the last months, adding doomscroolling to the equation, I have felt how my thinking is disappearing.
I tried to speculatively expand that idea in this post
This thread is closely related: https://news.ycombinator.com/item?id=47194847
"The right amount of AI is not zero. And it’s not maximum."
I know the topic won't probably be about this -and I'll be reading the article next-, just wanted to share that this title perfectly reminded me the feeling of attempting the speed reading technique explained in this old gem of a video (minute 20:15)
BOOKSTORES: How to Read More Books in the Golden Age of Content
You get the same when your company employs a lot of low to medium skilled offshore devs. Every morning or every week you get a huge pile of code that sort of works but there is simply no way to review it in a meaningful way. It's just too much. Thats how I feel working with Claude Code. It cranks out a lot of code really quickly but how do I know it's not creating subtle problems?
Just to make sure it's somewhere in these comments: the fundamental issue is people trying to measure something they don't understand. That is not new. The article gives an interesting exploration of how things break down in a new way when people focus too much on metrics instead of (IMO) the more robust approach of getting people who care to try to make something that feels quality. We're building crap, yes, but I blame the people who spend their time measuring "velocity" like it's a well defined term, not the coding tools being used to play the game.
It reminds me of Clay Christensen’s book How to Measure Your Life. In one of his talks, he talked about how companies get killed because they optimized for the wrong/short-term metrics. What we are seeing with AI could be a supercharged flavor of Innovator’s Dilemma, where organizations optimize a pre-existing set of success metrics while missing the bigger picture because some previous assumptions no longer hold.
I really like the article. It’s not trying to sell fear (which does sell); it doesn’t paint the leaderships as clueless. Nobody knows what is going to happen in the future. The article might be wrong on a few things. But it doesn’t matter. It points out a few assumptions that people might be missing and that is great.
I love the concept of Cognitive Debt. I think it ties nicely with the idea that AI is creating Tactical Sharknados: https://news.ycombinator.com/item?id=47048857
Before AI, we discuss on how to solve a problem with teammates. Even if we didn't remember exactly what we wrote 6mo ago, we at least remembered the general idea.
After AI, that understanding often disappears, to the point where we can't even direct the AI to fix the problem because we don't know what's wrong.
Also AI often changes the code in the context of current problem. So, we might get more bugs when fixing one.
The solution is ironically, LLMs. You can construct a set of Claude skills to walk you through a code review, or understand code (anything, even course) fast.
This complexity to understanding compression will be a big market going forward.
So just get the AI to summarise the codebase giving you more time to design a better buggy whip.
> When circumstances eventually require that understanding, when something breaks in an unexpected way or requirements change in a way that demands architectural reasoning, the organization discovers the deficit.
Maybe it's because I work in such a small team on a still-starting project, but even with the chaos of LLM-generated code, I can't imagine such a case as above that the LLMs couldn't also address.
Great read though and I appreciated the article.
Let’s go back to terms and thinking from 5 years ago. It’s called rushing. People are rushing now and they’re making mistakes. Some are big and systematic where they don’t pause to reflect on all the consequences and some are more local which are just bad coding bugs.
This reminds me again of _Programming as Theory Building_[1] by Peter Naur. With agents fast generating the code, we lose the time for building the theory in our heads.
I have a view that we are shifting from the traditional form of Engineering into a more AI guided form, where may be we are not learning as much about the code but about how we can produce that code with correct instructions and high level design.
It's like how we might not know how sewing is done but we know how to put instructions in a loom to produce it. I also agree it is still important to read that code and understand how it works, may be take a moment to see what is happening but we are learning something entirely different here.
This is like our whole technological society: many people only comprehend a small part of it at a time and only sketches of how other parts work
I think stronger determinism could dramatically improve the situation here. Right now, I don't know if the same model within the same hour will produce consistent output given identical prompts and low temperature.
I have no clue what my compiler is emitting every time I hit F5. I don't need to comprehend IL or ASM because I have a ~deterministic way to produce this output from a stable representation.
Writing a codebase as natural language is definitely feasible, but how we're going about it right now is not going to support this. A vast majority of LLM coding is coming out of ad-hoc human in the loop or stochastic agent swarms. If we want to avoid the comprehension gap we need something closer to a compiler & linker that operates over a bucket of version-controlled natural language documents.
More code written probably does means less understanding per line (or per a more germane metric), statistically speaking. More dilute understanding probably does lead to more failures and longer recovery times. This feels like something better addressed as an end-to-end actuarial problem though, rather than trying to design metrics for something elusive like understanding.
This seems very similar to the situation of a new employee dropped into a large codebase of varying quality. It seems like similar techniques will get you out of the mess?
Also, you can ask the coding agent for help at understanding it, unlike the old days when whoever wrote it is long gone.
Forgive me if I'm stating the obvious, but, it is completely plausible and possible to run a separate review of what ai just created, explaining what decisions where made and why, how they affect the existing system and going forward. This review can have a critique section over core failure modes that you have found in ai, or discrepancies unique to your setup. It can even be further condensed from verbose 2 page document into the core relevant explanation, for future references. - I think sometimes SWE's have an ego about needing to understand it entirely self-sufficiently, and so hold back on just asking relentless questions, like a child. 'But why?' 'but why?' 'but why?' until it is revealed, is a valid method in today's environment.
I wonder when we will realize that we just don’t need more software, just better software.
Good engineering has always been about minimizing the amount of effort it takes for someone to understand and modify your code. This is the motivation for good abstractions & interfaces, consistent design principles, single-responsibility methods without side-effects, and all of the things we consider "clean code".
These are more important than ever, because we don't have the crutch of "Teammate x wrote this and they are intimately familiar with it" which previously let us paper over bad abstractions and messy code.
This is felt more viscerally today because some people (especially at smaller/newer companies) have never had to work this way, and because AI gives us more opportunity to ignore it
Like it or not, the most important part of our jobs is now reviewing code, not writing it. And "shelfed" ideas will now look like unmerged PRs instead of unwritten code
Why wouldn’t you ask AI to explain the architecture and code? It’s much better and efficient than any human.
This happened to me yesterday. I give a junior engineer a project. He turns it around really quickly with Cursor. I review the code, get him to fix some things (again turned around really quickly with Cursor) and he merges it. I then try a couple test cases and the system does the wrong thing on the second one I try. I ask him to fix it. He puts into cursor a prompt like "fix this for xyz case" and submits a PR. But when I look at the PR, it's clearly wrong. The model completely misunderstood the code. So I leave a detailed comment explaining exactly what the code does.
He's moving so fast that he's not bothering to learn how the system actually works. He just implicitly trusts what the model tells him. I'm trying to get him to do end-to-end manual testing using the system itself (log into the web app in a local or staging environment and go through the actions that the user would go through), he just has the AI generate tests and trusts the output. So he completely misses things that would be clear if you learned the system at a deep level and could see how the individual project you're working on fit in with the larger system
I see this with all the junior engineers on my team. They've never learned how to use a debugger and don't care to learn. They just ask the model. Sometimes they think critically about the system and the best way to do something, but not always. They often aren't looking that critically at the model's output.
this seems like one of those nonsense posts people will look at in a couple years and laugh at
More AI slop, huh?
Can we get rules against this or something at this point? It's every other post.
Management where I work is currently touting a youtube video from some influencer about the levels of AI development, one of the later ones being "you'll care that it works, not how".
We are all supposed to be advancing through these levels. Moving at a pace where you actually understand the system you're responsible for is now considered a performance issue. But also, we're "still held responsible for quality".
Needless to say I'm dusting off my resume, but I'm sure plenty of other companies are following the same playbook.
Just read every line of the generated code and make sure it is as clear and good as possible. If you can't understand it when it's new you won't tomorrow, either. This verification process places a natural limit on the rate at which you can safely generate code. I suppose you could reduce that to spot checks and achieve probabilistic correctness but I would not venture there for things that matter.
Code has become cheaper to produce than to perceive.
Which means fixes can go in faster than it would require to first grok it
What’s missing in literally every single one of these conversations is testing
Literally all you have to do is implement test driven development and you solve like 99.9% of these issues
Even if you don’t go fully TDD which I’m not a fan of necessarily having an extensive testing suite the covers edge cases is necessary no matter what you do but it’s a need to have in a case where your code velocity is high
This is true for a company full of juniors pumping out code like early days of Facebook let’s say which allowed for their mono repot to grow insanely but it took major factors every few years but it didn’t really matter because they had their resources to do it
And now programmers experience what is like to be a user, trying to comprehend the system on their computer screen.
I propose a new paradigm: programmer experience, PX.
So, code generated by AI ideally would follow the rules of PX. Whatever those may turn out to be.
[dead]
"The system they built feels slightly foreign even as it functions correctly." This is exactly the same issue that engineers who become managers have. You are further away from the code; your understanding is less grounded, it feels disconnected.
When software engineers become agent herders their day-to-day starts to resemble more that of a manager than that of an engineer.