People who are saying they're not seeing productivity boost, can you please share where is it f...

throwaw12 • today at 5:31 AM • 21 replies • view on HN

People who are saying they're not seeing productivity boost, can you please share where is it failing?

Because, I am terrified by the output I am getting while working on huge legacy codebases, it works. I described one of my workflow changes here: https://news.ycombinator.com/item?id=47271168 but in general compared to old way of working I am saving half of the steps consistently, whether its researching the codebase, or integrating new things, or even making fixes. I have stopped writing code, occasionally I jump into the changes proposed by LLM and make manual edits if it is feasible, otherwise I revert changes and ask it to generate again but based on my learnings from the past rejected output

I am terrified about what's coming

Replies

yoyohello13 • today at 6:06 AM

The companies laying off people have no vision. My company is a successful not for profit and we are hiring like crazy. It’s not a software company, but we have always effectively unlimited work. Why would anyone downsize because work is getting done faster? Just do more work, get more done, get better than the competition, get better at delivering your vision. We put profits back in the community and actually make life better for people. What a crazy fucking concept right?

➕ show 13 replies

kranke155 • today at 8:40 AM

I work in commercials.

We can now make 1$ million dollar commercials with 100,000$ or less. So a 90% reduction in costs - if we use AI.

The issue is they don’t look great. AI isn’t that great at some key details.

But the agencies are really trying to push for it.

They think this is the way back to the big flashy commercials of old. Budgets are lower than ever, and shrinking.

Big issue here is really the misunderstanding of cause - budgets are lower, because advertising has changed in general (TV is less and less important ) and a lot of studies showed that advertising is actually not all that effective.

So they are grabbing onto a lifeboat. But I’m worried there’s no land.

I’ve planned my exit.

➕ show 1 reply

iugtmkbdfil834 • today at 8:47 AM

I don't want to generalize from my specific situation too much, but I want to offer an anecdote from my neck of the woods. On my personal sub, I agree it is kinda crazy the kind of projects I can get into now with little to no prior knowledge.

On the other hand, our corporate AI is.. not great atm. It was briefly kinda decent and then suddenly it kinda degraded. Worst case is, no one is communicating with us so we don't know what was changed. It is possible companies are already trying to 'optimize'.

I know it is not exactly what you are asking. You are saying capability is there, but I am personally starting to see a crack in corporate willingness to spend.

oytis • today at 8:45 AM

> I have stopped writing code, occasionally I jump into the changes proposed by LLM and make manual edits if it is feasible, otherwise I revert changes and ask it to generate again but based on my learnings from the past rejected output

Isn't it a very inefficient way to learn things? Like, normally, you would learn how things work and then write the code, refining your knowledge while you are writing. Now you don't learn anything in advance, and only do so reluctantly when things break? In the end there is a codebase that no one knows how it works.

➕ show 2 replies

kodablah • today at 5:57 AM

> People who are saying they're not seeing productivity boost, can you please share where is it failing?

At review time.

There are simply too many software industries that can't delegate both authorship _and_ review to non-humans because the maintenance/use of such software, especially in libraries and backwards-compat-concerning environments, cannot justify an "ends justifies the means" approach (yet).

msvana • today at 7:13 AM

I work as an ML engineer/researcher. When I implement a change in an experiment it usually takes at least an hour to get the results. I can use this time to implement a different experiment. Doesn't matter if I do it by hand or if I let an agent do it for me, I have enough time. Code isn't the bottleneck.

I also heard an opinion that since writing code is cheap, people implement things that have no economic value without really thinking it through.

➕ show 1 reply

belZaah • today at 7:10 AM

I don’t think the objections are not necessarily in terms of lack of productivity although my personal experience is not that of massive productivity increases. The fact that you are producing code much faster is likely just to push the bottleneck somewhere else. Software value cycles are long and complicated. What if you run into an issue in 5 years the LLM fails to diagnose or fix due to complex system interactions? How often would that happen? Would it be feasible to just generate the whole thing anew matching functionality precisely? Are you making the right architecture choices from the perspective of what the preferred modus operandi of an llm is in 5 years? We don’t know. The more experienced folks tend to be conservative as they have experienced how badly things can age. Maybe this time it’ll be different?

jpollock • today at 8:16 AM

The last time I tried AI, I tested it with a stopwatch.

The group used feature flags...

    if (a) {
       // new code
    } else {
       // old code
    }

    void testOff() {
       disableFlag(a);
       // test it still works
    }
    
    void testOn() {
        enableFlag(a);
        // test it still works
    }

However, as with any cleanup, it doesn't happen. We have thousands of these things lying around taking up space. I thought "I can give this to the AI, it won't get bored or complain."

I can do one flag in ~3minutes. Code edit, pr prepped and sent.

The AI can do one in 10mins, but I couldn't look away. It kept trying to use find/grep to search through a huge repo to find symbols (instead of the MCP service).

Then it ignored instructions and didn't clean up one or the other test, left unused fields or parameters and generally made a mess.

Finally, I needed to review and fix the results, taking another 3-5 minutes, with no guarantee that it compiled.

At that point, a task that takes me 3 minutes has taken me 15.

Sure, it made code changes, and felt "cool", but it cost the company 5x the cost of not using the AI (before considering the token cost).

Even worse, the CI/CD system couldn't keep up the my individual velocity of cleaning these up, using an automated tool? Yeah, not going to be pleasant.

However, I need to try again, everyone's saying there was a step change in December.

➕ show 2 replies

staticassertion • today at 8:28 AM

When it comes to novel work, LLMs become "fast typers" for me and little more. They accelerate testing phases but that's it. The bar for novelty isn't very high either - "make this specific system scale in a way that others won't" isn't a thing an LLM can ever do on its own, though it can be an aid.

LLMs also are quite bad for security. They can find simple bugs, but they don't find the really interesting ones that leverage "gap between mental model and implementation" or "combination of features and bugs" etc, which is where most of the interesting security work is imo.

➕ show 2 replies

aurareturn • today at 5:52 AM

  People who are saying they're not seeing productivity boost, can you please share where is it failing?

Believe it or not, I still know many devs who do not use any agents. They're still using free ChatGPT copy and paste.

I'm going to guess that many people on HN are also on the "free ChatGPT isn't that good at programming" train.

➕ show 3 replies

apsurd • today at 6:24 AM

AI dramatically increases velocity. But is velocity productivity? Productivity relative to which scope: you, the team, the department, the company?

The question is really, velocity _of what_?

I got this from a HN comment. It really hit for me because the default mentality for engineers is to build. The more you build the better. That's not "wrong" but in a business setting it is very much necessary but not sufficient. And so whenever we think about productivity, impact, velocity, whatever measure of output, the real question is _of what_? More code? More product surface area? That was never really the problem. In fact it makes life worse majority of the time.

➕ show 1 reply

sivanmz • today at 7:41 AM

It’s been my experience as of recently. I point it at an issue tracker and ask it to investigate, write a test to reproduce the problem and plan a fix together. There’s lots of hand holding from me but it saves me a lot of work and I’ve been surprised by its comfort with legacy code bases. For now I feel empowered, and I’m actually working more intensively, but I was wondering to myself if I’m going run out of work this year. Interestingly, our metrics show that output is slowed by increased workload on reviewers.

pinkmuffinere • today at 6:14 AM

I asked opus 4.6 how to administer an A/B test when data is sparse. My options are to look at conversion rate, look at revenue per customer, or something else. I will get about 10-20k samples, less than that will add to cart, less than that will begin checkout, and even less than that will convert. Opus says I should look at revenue per customers. I don't know the right answer, but I know it is not to look at revenue per customers -- that will have high variance due to outlier customers who put in a large order. To be fair, I do use opus frequently, and it often gives good enough answers. But you do have to be suspicious of its responses for important decisions.

Edit: Ha, and the report claims it's relatively good at business and finance...

Edit 2: After discussion in this thread, I went back to opus and asked it to link to articles about how to handle non-normally distributed data, and it actually did link to some useful articles, and an online calculator that I believe works for my data. So I'll eat some humble pie and say my initial take was at least partially wrong here. At the same time, it was important to know the correct question to ask, and honestly if it wasn't for this thread I'm not sure I would have gotten there.

➕ show 1 reply

boxedemp • today at 5:43 AM

I'm with you. The project I'm working on is moving at phenomenal velocity. I'm basically spending my time writing specs and performing code reviews. As long as my code review comments and design docs are clear I get a secure, scalable, and resilient system.

Tests were always important, but now they are the gatekeepers to velocity.

dataflow • today at 6:00 AM

I feel like this might be heavily dependent on both your task and the AI you're using? What language do you code in and what AI do you use? And are your tasks pretty typical/boilerplate-y with prior art to go off of, or novel/at-the-edge-of-tech?

wasmainiac • today at 6:29 AM

Because its failure rate is too high. Beyond boilerplate code and CRUD apps, if I let AI run freely on the projects I maintain, I spend more time fixing its changes than if I just did it myself. It hallucinates functionally, it designs itself into corners, it does not follow my instructions, it writes too much code for simple features.

It’s fine at replacing what stack overflow did nearly a decade ago, but that isn’t really an improvement from my baseline.

➕ show 1 reply

RandomLensman • today at 6:42 AM

Outside of coding/non-physical areas, the impact can be quite muted. I haven't seen much impact on surgical procedures, for example (but maybe others have?).

fulafel • today at 6:00 AM

A terminology tangent because it's an econ publication: Notice that the article doesn't talk about productivity.

Productivity is a term of art in economics and means you generate more units of output (for example per person, per input, per wages paid) but doesn't take quality or otherwise desireability into account. It's best suited for commodities and industrial outputs (and maybe slop?).

therealdrag0 • today at 5:49 AM

I can only explain it by people not having used Agentic tools and or only having tried it 9 months ago for a day before giving up or having such strict coding style preferences they burn time adjusting generated code to their preferences and blaming the AI even though they’re non-functional changes and they didn’t bother to encode them into rules.

The productivity gains are blatantly obvious at this point. Even in large distributed code bases. From jr to senior engineer.

➕ show 1 reply

KronisLV • today at 7:04 AM

I’m currently working across like 5 projects (was 4 last week but you know how it is). I now do more in days than others might in a week.

Yesterday a colleague didn’t quite manage to implement a loading container with a Vue directive instead of DOM hacks, it was easier for me to just throw AI at the problem and produced a working and tested solution and developer docs than to have a similarly long meeting and have them iterate for hours.

Then I got back to training a CNN to recognize crops from space (ploughing and mowing will need to be estimated alongside inference, since no markers in training data but can look at BSI changes for example), deployed a new version of an Ollama/OpenAI/Anthropic proxy that can work with AWS Bedrock and updated the docs site instructions, deployed a new app that will have a standup bot and on-demand AI code review (LiteLLM and Django) and am working on codegen to migrate some Oracle forms that have been stagnating otherwise.

It’s not funny how overworked I am and sure I still have to babysit parallel Claude Code sessions and sometimes test things manually and write out changes, but this is a completely different work compared to two or three years ago.

Maybe the problem spaces I’m dealing with are nothing novel, but I assume most devs are like that - and I’d be surprised at people’s productivity not increasing.

When people nag in meetings about needing to change something in a codebase, or not knowing how to implement something and its value add, I’ll often have something working shortly after the meeting is over (due to starting during it).

Instead of sending adding Vitest to the backlog graveyard, I had it integrated and running in one or two evenings with about 1200 tests (and fixed some bugs). Instead of talking about hypothetical Oxlint and Oxfmt performance improvements, I had both benchmarked against ESLint and Prettier within the hour.

Same for making server config changes with Ansible that I previously didn’t due to additional friction - it is mostly just gone (as long as I allow some free time planned in case things vet fucked up and I need to fix them).

Edit: oh and in my free time I built a Whisper + VLM + LLM pipeline based on OpenVINO so that I can feed it hours long stream VODs and get an EDL cut to desired length that I can then import in DaVinci Resolve and work on video editing after the first basic editing prepass is done (also PyScene detect and some audio alignment to prevent bad cuts). And then I integrated it with subscription Claude Code, not just LiteLLM and cloud providers with per-token costs for the actual cuts making part (scene description and audio transcriptions stay local since those don't need a complex LLM, but can use cloud for cuts).

Oh and I'm moving from my Contabo VPSes to running stuff inside of a Hetzner Server Auction server that now has Proxmox and VMs in that, except this time around I'm moving over to Ansible for managing it instead of manual scripts as well, and also I'm migrating over from Docker Swarm to regular Docker Compose + Tailscale networks (maybe Headscale later) and also using more upstream containers where needed instead of trying to build all of mine myself, since storage isn't a problem and consistency isn't that important. At the same time I also migrated from Drone CI to Woodpecker CI and from Nexus to Gitea Packages, since I'm already using Gitea and since Nexus is a maintenance burden.

If this becomes the new “normal” in regards to everyone’s productivity though, there will be an insane amount of burnout and devaluation of work.

➕ show 2 replies

truetraveller • today at 5:49 AM

You were probably deficient in RESEARCH skills before. No offense to you, since I was also like this once. LLMs research and put the results on the plate. Yes, for people who were deficient in research skills, I can see 2-3x improvements.

Note1: I have "expert" level research skills. But LLMs still help me in research, but the boost is probably 1.2x max. But

Note2: By research, I mean googling, github search, forum search, etc. And quickly testing using jsfiddle/codepen, etc.

➕ show 3 replies

alt Hacker News

Replies