It's an impossible thing to disprove. Anything you say can be countered by their "secret workflow" they've figured out. If you're not seeing a huge speedup well you're just using it wrong!
The burden of proof is 100% on anyone claiming the productivity gains
This gets comical when there are people, on this site of all places, telling you that using curse words or "screaming" with ALL CAPS on your agents.md file makes the bot follow orders with greater precision. And these people have "engineer" on their resumes...
There's no secret IMO. It's actually really simple to get good results. You just expect the same things from the LLM you would from a Junior. Use an MD file to force it to:
1) Include good comments in whatever style you prefer, document everything it's doing as it goes and keep the docs up to date, and include configurable logging.
2) Make it write and actually execute unit tests for everything before it's allowed to commit anything, again through the md file.
3) Ensure it learns from it's mistakes: Anytime it screws up tell it to add a rule to it's own MD file reminding it not to ever repeat that mistake again. Over time the MD file gets large, but the error rate plummets.
4) This is where it drifts from being treated as a standard Junior. YOU must manually verify that the unit tests are testing for the right thing. I usually add a rule to the MD file telling it not to touch them after I'm happy with them, but even then you must also now check that the agent didn't change them the first time it hit a bug. Modern LLM's are now worse at this for some reason. Probably because they're getting smart enough to cheat.
If you these basic things you'll get good results almost every time.
It's impossible to prove in either direction. AI benchmarks suck.
Personally, I like using Claude (for the things I'm able to make it do, and not for the things I can't), and I don't really care whether anyone else does.
Many of them are also exercising absurd token limits - like running 10 claudes at once and leaving them running continuously to "brute force" solutions out. It may be possible but it's not really an acceptable workflow for serious development.
We have had the fabled 10x engineer long before and independent of agentic coding. Some people claim it's real, others claim it's not, with much the same conviction. If something, that should be so clear cut, is debatable, why would anyone now be able to produce a convincing, discussion-resolving argument for (or against) agentic coding? We don't even manage to do that for tab/spaces.
The reason why both can't be resolved in a forum like this, is that coding output is hard to reason about for various reasons and people want it to be hard to reason about.
I would like to encourage people to think that the burden of proof always falls on themselves, to themselves. Managing to not be convinced in an online forum (regardless of topic or where you land on the issue) is not hard.
They remind me so much of that group of people who insist the scammy magnetic bracelets[1] "balance their molecules" or something making them more efficient/balanced/productive/energetic/whatever. They are also impossible to argue with, because "I feel more X" is damn near impossible to disprove.
[1] https://en.wikipedia.org/wiki/Power_Balance , https://en.wikipedia.org/wiki/Hologram_bracelet , https://en.wikipedia.org/wiki/Ionized_jewelry
> The burden of proof is 100% on anyone claiming the productivity gains
IMHO, I think this is just going to go away. I was up until recently using copilot in my IDE or the chat interface in my browser and I was severely underwhelmed. Gemini kept generating incorrect code which when pasted didn't compile, and the process was just painful and a brake on productivity.
Recently I started using Claude Code cli on their latest opus model. The difference is astounding. I can give you more details on how I am working with this if you like, but for the moment, my main point is that Claude Code cli with access to run the tests, run the apps, edit files, etc has made me pretty excited.
And my opinion has now changed because "this is the worst it will be" and I'm already finding it useful.
I think within 5 years, we won't even be having this discussion. The use of coding agents will be so prolific and obviously beneficial that the debate will just go away.
(all in my humble opinion)
Ah, the "then you are doing it wrong" defence.
Also, you have to learn it right now, because otherwise it will be too late and you will be outdated, even though it is improving very fast allegedly.
I mean, a DSL packed full of features, a full LSP, DAP for step debugging, profiling, etc.
https://github.com/williamcotton/webpipe
https://github.com/williamcotton/webpipe-lsp
https://github.com/williamcotton/webpipe-js
Take a look at my GitHub timeline for an idea of how little time this took for a solo dev!
Sure, there’s some tech debt but the overall architecture is pretty extensible and organized. And it’s an experiment. I’m having fun! I made my own language with all the tooling others have! I wrote my own blog in my own language!
One of us, one of us, one of us…
people claiming productivity gains do not have to prove anything to anyone. few are trying to open eyes of others but my guess is that will eventually stop. they will be the few though still left doing this SWE work in near future :)
Responses are always to check your prompts, and ensure you are using frontier models - along with a warning about how you will quickly be made redundant if you don't lift your game.
AI is generally useful, and very useful for certain tasks. It's also not initiating the singularity.
I go to meetups and enjoy myself so much; 80% of people are showing how to install 800000000 MCPs on their 92gb macbook pros, new RAG memory, n8n agent flows, super special prompting techniques, secret sauces, killer .md files, special vscode setups and after that they still are not productive vs just vanilla claude code in a git repos. You get people saying 'look I only have to ask xyz... and it does it! magic' ; then you just type in vanilla CC 'do xyz' and it does exactly the same thing, often faster.