One thing I find really funny is when AI enthusiasts make claims about agents and their own producti...

llmslave2 • last Thursday at 10:01 PM • 34 replies • view on HN

One thing I find really funny is when AI enthusiasts make claims about agents and their own productivity its always entirely anecdotally based on their own subjective experience, but when others make claims to the contrary suddenly there is some overwhelming burden of proof that has to be reached in order to make any sort of claims regarding the capabilities of AI workflows. So which is it?

Replies

misja111 • yesterday at 11:45 AM

A while ago someone posted a claim like that on LinkedIn again. And of course there was the usual herd of LinkedIn sheep who were full of compliments and wows about the claim he was making: a 10x speedup of his daily work.

The difference with the zillion others who did the same, is that he attached a link to a live stream where he was going to show his 10x speedup on a real life problem. Credits to him for doing that! So I decided to go have a look.

What I then saw was him struggling for one hour with some simple extension to his project. He didn't manage to finish in the hour what he was planning to. And when I had some thought about how much time it would have cost me by hand, I found it would have taken me just as long.

So I answered him in his LinkedIn thread and asked where the 10x speed up was. What followed was complete denial. It had just been a hick up. Or he could have done other things in parallel while waiting 30 seconds for the AI to answer. Etc etc.

I admit I was sceptic at the start but I honestly had been hoping that my scepticism would be proven wrong. But not.

➕ show 16 replies

AstroBen • last Thursday at 10:57 PM

It's an impossible thing to disprove. Anything you say can be countered by their "secret workflow" they've figured out. If you're not seeing a huge speedup well you're just using it wrong!

The burden of proof is 100% on anyone claiming the productivity gains

➕ show 12 replies

BatteryMountain • yesterday at 6:22 AM

Some fuel for the fire: the last two months mine has become way better, one-shotting tasks frequently. I do spend a lot of time in planning mode to flesh out proper plans. I don't know what others are doing that they are so sceptical, but from my perspective, once I figured it out, it really is a massive productivity boost with minimal quality issues. I work on a brownfield project with about 1M LoC, fairly messy, mostly C# (so strong typing & strict compiler is a massive boon).

My work flow: Planning mode (iterations), execute plan, audit changes & prove to me the code is correct, debug runs + log ingestion to further prove it, human test, human review, commit, deploy. Iterate a couple of times if needed. I typically do around three of these in parallel to not overload my brain. I have done 6 in the past but then it hits me really hard (context switch whiplash) and I start making mistakes and missing things the tool does wrong.

To the ones saying it is not working well for them, why don't you show and tell? I cannot believe our experiences are so fundamentally different, I don't have some secret sauce but it did take a couple of months to figure out how to best manipulate the tool to get what I want out of it. Maybe these people just need to open their minds and let go of the arrogance & resistance to new tools.

➕ show 6 replies

keeda • yesterday at 1:01 AM

Actually, quite the opposite. It seems any positive comment about AI coding gets at least one response along the lines of "Oh yeah, show me proof" or "Where is the deluge of vibe-coded apps?"

For my part, I point out there are a significant number of studies showing clear productivity boosts in coding, but those threads typically devolve to "How can they prove anything when we don't even know how to measure developer productivity?" (The better studies address this question and tackle it well-designed statistical methods such as randomly controlled trials.)

Also, there are some pretty large Github repos out there that are mostly vibe-coded. Like, Steve Yegge got to something like 350 thousand LoC in 6 weeks on Beads. I've not looked at it closely, but the commit history is there for anyone to see: https://github.com/steveyegge/beads/commits/main/

➕ show 3 replies

jaccola • yesterday at 9:37 AM

- This has been going on for well over a year now.

- They always write relatively long, zealous explainers of how productive they are (including some replies to your comment).

These two points together make me think: why do they care so much to convince me; why don't they just link me to the amazing thing they made, that would be pretty convincing?!

Are they being paid or otherwise incentivised to make these hyperbolic claims? To be fair they don't often look like vanilla LLM output but they do all have the same structure/patter to them.

➕ show 4 replies

Kiro • yesterday at 7:02 AM

They are not the same thing. If something works for me, I can rule out "it doesn't work at all". However, if something doesn't work for me I can't really draw any conclusions about it in general.

➕ show 1 reply

travisjungroth • last Thursday at 10:19 PM

> anecdotally based on their own subjective experience

So the “subjective” part counts against them. It’s better to make things objective. At least they should be reproducible examples.

When it comes to the “anecdotally” part, that doesn’t matter. Anecdotes are sufficient for demonstrating capabilities. If you can get a race car around a track in three minutes and it takes me four minutes, that’s a three minute race car.

➕ show 4 replies

nfw2 • last Thursday at 11:15 PM

The author is not claiming that ai agents don't make him more productive.

"I use LLM-generated code extensively in my role as CEO of Carrington Labs, a provider of predictive-analytics risk models for lenders."

LinXitoW • yesterday at 1:58 PM

Productivity gains in programming have always been incredibly hard to prove, esp. on an individual level. We've had these discussions a million times long before AI. Every time a manager tries to reward some kind of metric for "good" code, it turns out that it doesn't work that way. Every time Rust is mentioned, every C fan finds a million reasons why the improvement doesn't actually have anything to do with using Rust.

AI/LLM discussions are the exact same. How would a person ever measure their own performance? The moment you implement the same feature twice, you're already reusing learnings from the first run.

So, the only thing left is anecdotal evidence. It makes sense that on both sides people might be a little peeved or incredulous about the others claims. It doesn't help that both sides (though mostly AI fans) have very rabid supporters that will just make up shit (like AGI, or the water usage).

Imho, the biggest part missing from these anecdotes is exactly what you're using, what you're doing, and what baseline you're comparing it to. For example, using Claude Code in a typical, modern, decently well architected Spring app to add a bunch of straight forward CRUD operations for a new entity works absolutely flawlessly, compared to a junior or even medior(medium?) dev.

Copy pasting code into an online chat for a novel problem, in an untyped, rare language, with only basic instructions and no way for the chat to run it, will basically never work.

order-matters • yesterday at 12:14 AM

the people having a good experience with it want the people who arent to share how they are using it so they can tell them how they are doing it wrong.

honestly though idc about coding with it, i rarely get to leave excel for my work anyway. the fact that I can OCR anything in about a minute is a game changer though

felipeerias • last Thursday at 11:28 PM

Claims based on personal experience working on real world problems are likelier to be true.

It’s reasonable to accept that AI tools work well for some people and not for others.

There are many ways to integrate these tools and their capabilities vary wildly depending on the kind of task and project.

Hobadee • yesterday at 4:10 PM

I will prefix this all by saying I'm not in a professional programming position, but I would consider myself an advanced amateur, and I do code for work some. (General IT stuff)

I think the core problem is a lot of people view AI incorrectly and thus can't use it efficiently. Everyone wants AI to be a Jr or Sr programmer, but I have serious doubts as to the ability of AI to ever have original thought, which is a core requirement of being a programmer. I don't think AI will ever be a programmer, but rather a tool to help programmers take the tedium away. I have seen massive speedups in my own workflow removing the tedium.

I have found prompting AI to be of minimal use, but tab-completion definitely speeds stuff up for me. If I'm about to create some for loop, AI will usually have a pretty good scaffold for me to use. If I need to handle an error, I start typing and AI will autocomplete the error handling. When I write my function documentation I am usually able to just tab-complete it all.

Yes, I usually have to go back and fix some things, and I will often skip various completion hints, but the scaffold is there, and as I start fixing faulty code it generated AI will usually pick up on the fixes and help me tab-complete the fixes themselves. If AI isn't giving me any useful tab-completions, I'll just start coding what I need, and AI picks up after a few lines and I can tab-complete again.

Occasionally I will give a small prompt such as "Please write me a loop that does X", or "Please write a setter function that validates the input", but I'll still treat that as a scaffold and go back and fix things, but I always give it pretty simple tasks and treat it simply as a scaffold generator.

I still run into the same problem solving issues I had before AI, (how do I tackle X problem?) and there isn't nearly as much speedup there, (Although now instead of talking to a rubber duck, I can chat with AI to help figure things out) but once I settle on the solution and start implementing it, I get that AI tab completion boost again.

With all that being said, I do also see massive boosts with fairly basic tasks that can be templated off something that already exists, such as creating unit tests or scaffolding a class, although I do need to go back and tweak things.

In summary, yes, I probably do see a 10x speedup, but it's really a 10x speedup in my typing speed more than a 10x speedup in solving the core issues that make programming challenging and fun.

➕ show 1 reply

frez1 • yesterday at 8:42 AM

what i enjoy the most is every "AI will replace engineers" article is written by an employee working at an AI company with testimonials from other people also working at AI companies

heavyset_go • yesterday at 6:10 AM

Now that the "our new/next model is so good that it's sentient and dangerous" AGI hype has died down, the new hype goalpost is "our new/next model is so good it will replace your employees and do their jobs for you".

Within that motte and bailey is, "well my AI workflow makes me a 100x developer, but my workflow goes to a different school in a different town and you don't know her".

There's value there, I use local and hosted LLMs myself, but I think there's an element of mania at play when it comes to self-evaluation of productivity and efficacy.

DauntingPear7 • yesterday at 9:50 PM

As a CS student who kinda knows how to build things. I do in fact get a speedup when querying AI or letting AI do some coding for me. However, I have a poor understanding of the system it builds, and it does a quite frankly terrible job with project architecture. I use Claude sonnet 4.5 with Claude code, and I can get things implemented rather quickly while using it, but if anything goes wrong I just don’t have that great of an idea where anything is, what code is in charge of what, etc. I can also deeply feel the brainrot of using AI. I get lazy and I can feel myself getting worse at solving what should be easy problems. My mental image of the problem to solve gets fuzzy and I don’t train that muscle like I would if I didn’t use AI to help me solve it.

jimbo808 • last Thursday at 11:17 PM

This is not always the case, but I get the impression that many of them are paid shills, astroturf accounts, bots, etc. Including on HN. Big AI is running on an absurd amount of capital and they're definitely using that capital to keep the hype cycle going as long as possible while they figure out how to turn a profit (or find an exit, if you're cynical - which I am).

➕ show 2 replies

safety1st • yesterday at 6:06 AM

I think it's a complex discussion because there's a whole bundle of new capabilities, the largest one arguably being that you can build a conversational interface to any piece of software. There's tons of pressure to express this in terms of productivity, financial and business benefits, but like with a coding agent, the main win for me is reduction of cognitive load, not an obvious "now the work gets done 50% faster so corporate can cut half the dev team."

I can talk through a possible code change with it which is just a natural, easy and human way to work, our brains evolved to talk and figure things out in a conversation. The jury is out on how much this actually speeds things up or translates into a cost savings. But it reduces cognitive load.

We're still stuck in a mindset where we pretend knowledge workers are factory workers and they can sit there for 8 hours producing consistently with their brain turned off. "A couple hours a day of serious focus at best" is closer to the reality, so a LLM can turn the other half of the day into something more useful maybe?

There is also the problem that any LLM provider can and absolutely will enshittify the LLM overnight if they think it's in their best interest (feels like OpenAI has already done this).

My extremely casual observations on whatever research I've seen talked about has suggested that maybe with high quality AI tools you can get work done 10-20% faster? But you don't have to think quite as hard, which is where I feel the real benefit is.

lazarus01 • yesterday at 2:50 PM

>> when others make claims to the contrary suddenly there is some overwhelming burden of proof that has to be reached

That is just plain narcissism. People seeking attention in the slipstream of megatrends, make claims that have very little substance. When they are confronted with rational argument, they can’t respond intellectually, they try to dominate the discussion by asking for overwhelming burden of proof, while their position remains underwhelming.

LinkedIn and Medium are densely concentrated with this sort of content. It’s all for the likes.

viraptor • yesterday at 12:01 AM

There are different types of contrary claims though, which may be an issue here.

One example: "agents are not doing well with code in languages/frameworks which have many recent large and incompatible changes like SwiftUI" - me: that's a valid issue that can be slightly controlled for with project setup, but still largely unsolved, we could discuss the details.

Another example: "coding agents can't think and just hallucinate code" - me: lol, my shipped production code doesn't care, bring some real examples of how you use agents if they don't work for you.

There's a lot of the second type on HN.

➕ show 1 reply

giancarlostoro • yesterday at 3:53 AM

Last time I ran into this it was a difference of how the person used the AI, they weren't even using the agents, they were complaining that the AI didn't do everything in one shot in the browser. You have to figure out how people are using the models, because everyone was using AI in browser in the beginning, and a lot of people are still using it that way. Those of us praising the agents are using things like Claude Code. There is a night and day difference in how you use it.

athrowaway3z • yesterday at 12:55 PM

Public discourse on this is a dumpster fire. But you're not making a meaningful contribution.

It is the equivalence of saying: Stenotype enthusiasts claim they're productive, but when we give them to a large group of typers we get data disproving that.

Which should immediately highlights the issue.

As long as these discussions aren't prefaced with the metric and methodology, any discussion on this is just meaningless online flame wars / vibe checks.

bryanrasmussen • yesterday at 4:50 PM

subjective experience is heavily influenced by expectations and desires, so they should try to verify.

deadbabe • yesterday at 12:56 AM

This is why I can’t wait for the costs of LLMs to shoot up. Nothing tells you more about how people really feel about AI asssitants than how much they are willing to pay for them. These AI are useful but I would not pay much more than what they are priced at today.

palmotea • last Thursday at 10:53 PM

> One thing I find really funny is when AI enthusiasts make claims about agents and their own productivity its always entirely anecdotally based on their own subjective experience, but when others make claims to the contrary suddenly there is some overwhelming burden of proof that has to be reached in order to make any sort of claims regarding the capabilities of AI workflows. So which is it?

Really? It's little more than "I am right and you are wrong."

colechristensen • last Thursday at 10:33 PM

On one hand "this is my experience, if you're trying to tell me otherwise I need extraordinary proof" is rampant on all sides.

On the other hand one group is saying they've personally experienced a thing working, the other group says that thing is impossible... well it seems to the people who have experienced a thing that the problem is with the skeptic and not the thing.

➕ show 3 replies

immibis • yesterday at 1:07 PM

Everything you need to know about AI productivity is shown in this first chart here:

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...

➕ show 1 reply

ulfw • yesterday at 9:47 AM

It's because the thing is overhyped and too many people are vested in keeping the hype going. Facing reality at this point, while necessary, is tough. The amount of ads for scam degrees from reputable unis about 'Chief AI Officer' bullshit positions is staggering. There's just tooo much AI bubbling

alfalfasprout • yesterday at 1:02 AM

TBH a lot of this is subjective. Including productivity.

My other gripe too is productivity is only one aspect of software engineering. You also need to look at tech debt introduced and other aspects of quality.

Productivity also takes many forms so it's not super easy to quantify.

Finally... software engineers are far from being created equal. VERY big difference in what someone doing CRUD apps for a small web dev shop does vs. eg; an infra engineer in big tech.

citizenpaul • yesterday at 12:36 AM

Its really a high level bikeshed. Obviously we are all still using and experimenting with LLM's. However there is a huge gap of experiences and total usefulness depending on the exact task.

The majority of HN's still reach for LLM's pretty regularly even if they fail horribly frequently. Thats really the pit the tech is stuck in. Sometimes it oneshots your answer perfectly, or pair programs with you perfectly for one task, or notices a bug you didn't. Sometimes it wastes hours of your time for various subtle reasons. Sometimes it adamantly insists 2 + 2 = 55

➕ show 1 reply

SkyBelow • yesterday at 11:20 AM

If someone seems to have productivity gains when using an AI, it is hard to come up with an alternate explanation for why they did.

If someone sees no productivity gains when using an AI (or a productivity decrease), it is easy to come up with ways it might have happened that weren't related to the AI.

This is an inherent imbalance in the claims, even if we both people have brought 100% proof of there specific claims.

A single instance of something doing X is proof of the claim that something can do X, but no amount of instances of something not doing X is proof of the claim that something cannot do X. (Note, this is different from people claiming that something always does X, as one counter example is enough to disprove that.)

Same issue in math with the difference between proving a conjecture is sometimes true and proving it is never true. Only one of these can be proven by examples (and only a single example is needed). The other can't be proven even by millions of examples.

nfw2 • last Thursday at 10:53 PM

[flagged]

➕ show 4 replies

bdangubic • last Thursday at 10:32 PM

Which is it is clear - the enthusiast have spent countless hours learning/configuring/adjusting, figuring out limitations, guarding against issue etc etc etc and now do 50 to 100 PRs per week like Boris

Others … need to roll up the sleeves and catch up

➕ show 3 replies

CuriouslyC • yesterday at 3:02 AM

People working in languages/libraries/codebases where LLMs aren't good is a thing. That doesn't mean they aren't good tools, or that those things won't be conquered by AI in short order.

I try to assume people who are trashing AI are just working in systems like that, rather than being bad at using AI, or worse, shit-talking the tech without really trying to get value out of it because they're ethically opposed to it.

A lot of strongly anti-AI people are really angry human beings (I suppose that holds for vehemently anti-<anything> people), which doesn't really help the case, it just comes off as old man shaking fist at clouds, except too young. The whole "microslop" thing came off as classless and bitter.

➕ show 1 reply

alt Hacker News

Replies