The problem with your argument here is that you're effectively saying that developers (like myself) who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.
Either I've wasted significant chunks of the past ~3 years of my life or you're missing something here. Up to you to decide which you believe.
I agree that it's hard to take solid measurements due to non-determinism. The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well and figure out what levers they can pull to help them perform better.
Another problem with it is that you could have said the same thing about virtually any advancement in programming over the last 30 years.
> who put effort into figuring out good workflows for coding with LLMs are deceiving themselves, and are effectively wasting their time.
It's quite possible you do. Do you have any hard data justifying the claims of "this works better", or is it just a soft fuzzy feeling?
> The same goes for managing people, and yet somehow many good engineering managers can judge if their team is performing well
It's actually really easy to judge if a team is performing well.
What is hard is finding what actually makes the team perform well. And that is just as much magic as "if you just write the correct prompt everything will just work"
---
wait. why are we fighting again? :) https://dmitriid.com/everything-around-llms-is-still-magical...
I'm not the OP and I"m not saying you are wrong, but I am going to point out that the data doesn't necessarily back up significant productivity improvements with LLMs.
In this video (https://www.youtube.com/watch?v=EO3_qN_Ynsk) they present a slide by the company DX that surveyed 38,880 developers across 184 organizations, and found the surveyed developers claiming a 4 hour average time savings per developer per week. So all of these LLM workflows are only making the average developer 10% more productive in a given work week, with a bunch of developers getting less. Few developers are attaining productivity higher than that.
In this video by stanford researchers actively researching productivity using github commit data for private and public repositories (https://www.youtube.com/watch?v=tbDDYKRFjhk) they have a few very important data points in there:
1. There's zero correlation they've found between how productive respondants claim their productivity is and how it's actually measured, meaning people are poor judges of their own productivity numbers. This does refute the claims on the previous point I made but only if you assume people are wildly more productive then they claim on average.
2. They have been able to measure actual increase in rework and refactoring commits in the repositories measured as AI tools become more in use in those organizations. So even with being able to ship things faster, they are observing increase number of pull requests that need to fix those previous pushes.
3. They have measured that greenfield low complexity systems have pretty good measurements for productivity gains, but once you get more towards higher complexity systems or brownfield systems they start to measure much lower productivity gains, and even negative productivity with AI tools.
This goes hand in hand with this research paper: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o... which had experienced devs in significant long term projects lose productivity when using AI tools, but also completely thought the AI tools were making them even more productivity.
Yes, all of these studies have their flaws and nitpicks we can go over that I'm not interested in rehashing. However, there's a lot more data and studies that show AI having very marginal productivity boost compared to what people claim than vice versa. I'm legitimately interested in other studies that can show significant productivity gains in brownfield projects.
So far I've found that the people who are hating on AI are stuck maintaining highly coupled that they've invested a significant amount of mental energy internalizing. AI is bad on that type of code, and since they've invested so much energy on understanding the code, it ends up taking longer for them to load context and guide the AI than to just do the work. Their code base is hot coupled garbage, and rather than accept that the tools aren't working because of their own lack of architectural rigor, they just shit on the tools. This is part of the reason that that study of open source maintainers using Cursor didn't consistently produce improvement (also, Cursor is pretty mid).
https://www.youtube.com/watch?v=tbDDYKRFjhk&t=4s is one of the largest studies I've seen so far and it shows that when the codebase is small or engineered for AI use, >20% productivity improvements are normal.
That's not a problem, that is the argument. People are bad at measuring their own productivity. Just because you feel more productive with an LLM does not mean you are. We need more studies and less anecdata