Comment sections on AI threads tend to split into "we're all cooked" and "AI is useless." I'd like to cut through the noise and learn what's actually working and what isn't, from concrete experience.
If you've recently used AI tools for professional coding work, tell us about it.
What tools did you use? What worked well and why? What challenges did you hit, and how (if at all) did you solve them?
Please share enough context (stack, project type, team size, experience level) for others to learn from your experience.
The goal is to build a grounded picture of where AI-assisted development actually stands in March 2026, without the hot air.
Using Claude Code professionally for the last 2 months (Max plan) at Rhoda AI and love it!
Software Engineering has never been more enjoyable.
Python, C++, Docker, ML infra, frontend, robotics software
I have 5 concurrent Claude Code sessions on the same mono repo.
Thank you Anthropic!
I work in HPC and I’ve found it very useful in creating various shell scripts. It really helps if you have linters such as shellcheck.
Other areas of success have been just offloading the typing/prototyping. I know exactly how the code should look like so I rarely run into issues.
I enjoy Opus on personal projects. I don’t even bother to check the code. Go/JavaScript/Typescript/CSS works very well for me. Swift not so much. I haven’t tried C/C++ yet. Scala was Ok.
Professionally I hardly use the tools for coding, since I’m in an architecture role and mostly write design docs and do reviews. And I write the occasional prototype.
I have started building tools to integrate copilot (Opus) better with $CORP. This way I can ask it questions across confluence and github.
Leveraging Claude for a project feels very addictive to me. I have to make a conscious effort to stop and I end up working on multiple projects at the same time.
It's been great - I work on a lot of projects that are essentially prototypes, to test out different ideas. It's amazing for this - I can create web apps in a day now, which in the past I would not have been able to create at all, as I spent most of my career on the backend.
Very hit or miss.
Stack: go, python Team size: 8 Experience, mixed.
I'm using a code review agent which sometimes catches a critical big humans miss, so that is very useful.
Using it to get to know a code base is also very useful. A question like 'which functions touch this table' or 'describe the flow of this API endpoint' are usually answered correctly. This is a huge time saver when I need to work on a code base i'm less familiar with.
For coding, agents are fine for simple straightforward tasks, but I find the tools are very myopic: they prefer very local changes (adding new helper functions all over the place, even when such helpers already exist)
For harder problems I find agents get stuck in loops, and coming up with the right prompts and guardrails can be slower than just writing the code.
I also hates how slow and unpredictable the agents can be. At times it feels like gambling. Will the agents actually fix my tests, or fuck up the code base? Who knows, let's check in 5 minutes.
IMO the worst thing is that juniors can now come up with large change sets, that seem good at a glance but then turn out to be fundamentally flawed, and it takes tons of time to review
I’m transitioning from AI assisted (human in the loop) to AI driven (human on the loop) development. But my problems are pretty niche, I’m doing analytics right now where AI-driven is much more accessible. I’m in a team of three but so far I’m the only one doing the AI driven stuff. It basically means focusing on your specification since you are handing development off to the AI afterwards (and then a review of functionality/test coverage before deploying).
Mostly using Gemini Flash 3 at a FAANG.
On my side I have used Claude code, tbh for solo projects it's good enough if you already know what you need to do.
Answering your questions:
On my job we've been spoon fed to use GH copilot everywhere we can. It's been configured to review PRs, make corrections etc. - I'd say it's good enough but from time to time it will raise false positives on issues. I'd say it works fine but you still need to keep an eye on generated BS.
I've seen coworkers show me amazing stuff done with agentic coding and I've seen coworkers open up slop PRs with bunch of garbage generated code which is kind of annoying, but I'll let it slide...
Stack - .NET, Angular, SQL Server and ofc hosted in Azure.
Team is composed of about 100 engineers (devs, QA, devops etc.) and from what I can see there are no Juniors, which is sad to see if you ask me
Originally my workflow was:
- Think about requirement
- Spend 0-360 minutes looking through the code
- Start writing code
- Realize I didn't think about it quite enough and fix the design
- Finish writing code
- Write unit tests
- Submit MR
- Fix MR feedback
Until recently no LLM was able to properly disrupt that, however the release of Opus 4.5 changed that.
Now my workflow is:
- Throw as much context into Opus as possible about what I want in plan mode
- Spend 0-60 minutes refining the plan
- Have Opus do the implementation
- Review all the code and nitpick small things
- Submit MR
- Implement MR feedback
AI-assisted research is a solid A already. If you are doing greenfield then. The horizon is only blocked by the GUI required tooling. Even then, that is a small enough obstruction for most researchers.
You guys are definitely missing out. I have the perfect army of mid-level engineers. Using codex lately, my own CPU and ram are the ones holding me back from spinning more and more agents
I'm a manager at a large consumer website. My team and I have built a harness that uses headless Claude's (running Opus) to do ticket work, respond to and fix PR comments, and fix CI test failures. Our only interaction with code is writing specs in Jira tickets (which we primarily do via local Claudes) and adding PR comments to GitHub PRs.
The speed we can move at is astounding. We're going to finish our backlog next quarter. We're conservatively planning on launching 3x as many features next quarter.
Claude is far from perfect: it's made us reassess our coding standards since code is primarily for Claude now, not for humans. So much of what we did was to make code easier for the next dev, and that just doesn't matter anymore.
Going back and forth with an AI all day psychologically draining, as is checking its output with a fine tooth comb.
Einstein said something like: "To punish my distain for authority, God made me an authority". I feel like to push my distain for dev managers, techbro Jesus has made me a dev manager, of AI agents.
The output the agent creates falls into one of these categories:
1. Correct, maintainable changes 2. Correct, not maintable changes 3. Correct diff, maintains expected system interaction 4. Correct diff, breaks system interaction.
In no way are they consistent or deterministic but _always_ convincing they are correct.
I regularly get like 12 merge requests come in during a span of an hour. We even employed an agent to do code reviews as part of CI process. It catches bugs contained in the diff, but has never caught bugs in system interactions. That burden is still on me.
Management is forcing experimentation and adoption of these tools (as if I have free bandwidth to not do planned work) and if I raise valid concern about the functional or saftey gaps, I get talked down to like I am a luddite opposing change.
For things I don't need to maintain and are small, like tools for work or personal projects it has been very fun. I can explore ideas and debate if they are worth pursing. Wish I could say the same about the control and data plane that I have to maintain that customers expect 99 uptime
My current employer is taking a long time to figure out how they think they want people to use it, meanwhile, all my side projects for personal use are going quite strong.
It's useful. At my company we have an internal LLM that tends to be used in lieu of searching the web, to avoid unintentionally leaking information about what we are working on to third parties. This includes questions about software development, including generating of code. For various reasons we are not permitted to copy this verbatim, but can use it for guidance - much like, say, inspiration from Stack Overflow answers.
I am getting disproportionately good results with the models by following a process: spec -> plan -> critique -> improve plan -> implement plan.
It has reignited my passion for coding by making it so I don't have to use my coding muscle as much during the day to improve our technologically boring product.
At the end of the day, I'm being paid to ensure that the code deployed to production meets a particular bar of quality. Regardless of whether I'm reviewing code or writing it, If I let a commit be merged, I have to be convinced that it is a net positive to the codebase.
People having easy access to LLMs makes this job much harder. LLMs can create what looks at the surface like expert-written code, but suffers from below-the-surface issues that will reveal themselves as intermittent issues or subtle bugs after being deployed.
Inexperienced devs create huge commits full of such code, and then expect me to waste an entire day searching for such issues, which is miserable.
If the models don't improve significantly in the future, I expect that most high-stakes software teams will fire all the inexperienced devs and have super-experienced engineers work with the bots directly.
One thing I use Claude for is diagramming system architecture stuff in LateX and it’s great, I just describe what I am visualizing and kaboom I get perfect output I can paste into overleaf.
I also work at big tech. Claude code is very good and I have not written code by hand in months. These are very messy codebases as well.
I have to very much be in the loop and constantly guiding it with clarifying questions but it has made running multiple projects in parallel much easier and has handled many tedious tasks.
I almost don't write any code by hand anymore and was able to take up a second part-time job thanks to genai.
Running faster does not mean anything unless you know where you are going.
I have the freedom to work with AI tools as much as I as I want and kind of lead the team in the direction I see fit.
It’s a lot of fun for exploring ideas. I’ve built things very fast that I would not have done at all otherwise. I have rewritten a huge chunk of semi-outdated docs into something useful with a couple of Prompts in a day. Claude does all the annoying dependency update breaks the build kinds of things. And the reviews are extremely useful and a perfect combination with human review as they catch things extremely well that humans are bad at catching.
But in the production codebase changes must be made with much more consideration. Claude tends to perform well ob some tasks but for other I end up wasting time because I just don’t know up front how the feature must look so I cannot write a spec at the level of precision that claude needs and changing code manually is more efficient for this kind of discovery for me than dealing with large chunks of constantly changing code.
And then there’s the fact that claude produces things that work and do the thing described in the prompt extremely well but they are always also wring in sone way. When I let AI build a large chunk of code and actually go through the code there’s always a mess somewhere that ai review doesn’t see because it looks completely plausible but contains some horrible security issue or complete inconsistency with the rest of the codebase or, you know, that custom yaml parser nobody asked for and that you don’t want your day job to depend on.
half the answers: yes we move so fast. I haven't seen a text editor in months
other half: i keep fixing everything from other teams using AI otherwise i destroy my carrer.
this thread is very eye opening on how things are going.
Exceptionally well. I’ve been using it for my side project for the last 7 months and have learned how to use it really well on a rather large codebase now. My side project has about 100k LOC and most of it is AI generated, though I do heavily review and edit.
> If you've recently used AI tools for professional coding work, tell us about it.
POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is easy for me to fix the code than asking the model to fix it for me.
The work: Fairly straightward UI + backend work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages.
POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a few back & forths with the model, some direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further.
Challenges: 1) AI first documentation: Stories have to be written with greater detail and acceptance criteria. 2) Code reviews: copilot reviews on Github are surprisingly insightful, but waiting on human reviews is still a bottleneck. 3) AI first thinking: Some of the lead devs are still hung up on certain best practices that are not relevant in a world where the machine generates most of the code. There is a friction in the code LLM is good at and the standards expected from an experienced developer. This creates busy work at best, frustration at worst. 4) Anti-AI sentiment: There is a vocal minority who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a bit political and slack channels are getting interesting. 5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, some members struggle more than others. 6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term impact on producing developers who can code for a living.
Personally, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.
What you said about "we're all cooked" and "AI is useless" is literally me and everyone I know switching between the two on an hourly basis...
I find it the most exciting time for me as a builder, I can just get more things done.
Professionally, I'm dreading for our future, but I'm sure it will be better than I fear, worse than I hope.
From a toolset, I use the usual, Cursor (Super expensive if you go with Opus 4.6 max, but their computer use is game changing, although soon will become a commodity), Claude code (pro max plan) - is my new favorite. Trying out codex, and even copilot as it's practically free if you have enterprise GitHub. I'm going to probably move to Claude Code, I'm paying way too much for Cursor, and I don't really need tab completion anymore... once Claude Code will have a decent computer use environment, I'll probably cancel my Cursor account. Or... I'll just use my own with OpenClaw, but I'm not going to give it any work / personal access, only access to stuff that is publicly available (e.g. run sanity as a regular user). Playing with skills, subagents, agent teams, etc... it's all just markdowns and json files all the way down...
About our professional future:
I'm not going to start learning to be a plumber / electrician / A/C repair etc, and I am not going to recommend my children to do so either, but I am not sure I will push them to learn Computer Science, unless they really want to do Computer Science.
What excites me the most right now is my experiments with OpenClaw / NanoClaw, I'm just having a blast.
tl;dr most exciting yet terrifying times of my life.
5 years ago, I set out to build an open-source, interoperable marketplace. It was very ambitious because it required me to build an open source Shopify for not only e-commerce, but restaurants, gyms, hotels, etc that this marketplace could tap into. Without AI, this vision would’ve faltered but with AI, this vision is in reach. I see so many takes about AI killing open-source but I truly think it will liberate us from proprietary SaaS and marketplaces given enough time.
I've never been more productive. If only I had a job...
Context: I work in robotics. We use mostly c++ and python. The entire team is about 200 though the subset I regularly interact with is maybe 50.
I basically don't use AI for coding at all. When I have tried it, it's just half working garbage and trying to describe what I want in natural language is just miserable. It feels like trying to communicate via smoke signals.
I'll be a classical engineer until they fire me and then go do something else. So far, that's working. We've had multiple rounds of large layoffs in the last year and somehow I'm still here.
Very powerful tool. In the right hands.
I work for a university. We've got a dedicated ChatGPT instance but haven't been approved to use a harness yet. Some devs did a pilot project and approval/licenses are supposedly coming soon.
I still like using it for quick references and autocomplete, boilerplate function. It's funny that text completion with tab is now seen as totally obsolete to some folks.
I use it all the time now, switching between claude code, codex, and cursor. I prefer CC and codex for now but everyone is copying everyone else's homework.
I do a lot of green field research adjacent work, or work directly with messy code from our researchers. It's been excellent at building small tools from scratch, and for essentially brute forcing undocumented code. I can give it a prompt like "Here is this code we got from research, the docs are 3 months out of date and don't work, keep trying things until you manage to get $THING running".
Even for more production and engineering related tasks I'm finding it speeds up velocity. But my engineering is still closer to greenfield than a lot of people here.
I do however feel less connected to the code, even when reviewing thoroughly, I feel like I internalize things at a high level, rather than knowing every implementation detail off the dome.
The other downside is I get bigger and more frequent code review requests from colleagues. No on is just handing me straight up slop (yet...)
Measured 12x increase in issues fixed/implemented. Solo founder business, so these are real numbers (over 2 months), not corporate fakery. And no, I am not interested in convincing you, I hope all my competitors believe that AI is junk :-)
It’s not really that useful for what people tell me it will be useful for, most of the time. For context, I am a senior engineer that works in fintech, mostly doing backend work on APIs and payment rail microservices.
I find the most use from it as a search engine the same way I’d google “x problem stackoverflow”.
When I was first tasked with evaluating it for programming assistance, I thought it was a good “rubber duck” - but my opinion has since changed. I found that if I documented my goals and steps, using it as a rubber duck tended to lead me away from my goals rather than refine them.
Outside of my role they can be a bit more useful and generally impressive when it comes to prompting small proof of concept applications or tools.
My general take on the current state of LLMs for programming in my role is that they are like having a junior engineer that does not learn and has a severe memory disorder.
Here's my anecdote: I use ChatGPT, Gemini (web chat UI) and Claude. Claude is a bit more convenient in that it has access to my code bases, but this comes at the cost of I have to be careful I'm steering it correctly, while with the chat bots, I can feed it only the correct context.
They simplify discrete tasks. Feature additions, bug fixes, augmenting functionality.
They are incapable of creating good quality (easily expandable etc) architecture or overall design, but that's OK. I write the structs, module layout etc, and let it work on one thing at a time. In the past few days, I've had it:
- Add a ribbon/cartoon mesh creator
- Fixed a logical vs physical pixel error on devices where they were different for positioning text, and setting window size
- Fixed a bug with selecting things with the mouse under specific conditions
- Fixing the BLE advertisement payload format when integrating with a service.
- Inputting tax documents for stock sales from the PDF my broker gives me to the CSV format the tax software uses
Overall, great tool! But I think a lot of people are lying about its capabilities.I measured my output:
- 1.5x more commits. - 2x more issues closed.
The commits are real. I'm not doing "vibe coding" or even agentic coding. I'm doing turn-by-turn where I micromanage the LLM, give specific implementation instructions, and then read and run the output before committing the code.
I'm more than happy with 2x issues closed. For my client work it means my wildly optimistic programmer estimates are almost accurate now.
I did have a frustrating period where a client was generating specs using ChatGPT. I was simply honest: "I have no idea what this nonsense means, let's meet to discuss the new requirements." That worked.
Somewhat against the common sentiment, I find it's very helpful on a large legacy project. At work, our main product is a very old, very large code base. This means it's difficult to build up a good understanding of it -- documentation is often out of date, or makes assumptions about prior knowledge. Tracking down the team or teams that can help requires being very skilled at navigating a large corporate hierarchy. But at the end of the day, the answers for how the code works is mostly in the code itself, and this is where AI assistance has really been shining for me. It can explore the code base and find and explain patterns and available methods far faster than I can.
My prompts end to be in the pattern of "I am looking to implement <X>. <Detailed description of what I expect X to do.>. Review the code base to find similar examples of how this is currently done, and propose a plan for how to implement this."
These days I'm on Claude Code, and I do that first part in Plan mode, though even a few months ago on earlier, not-as-performant models and tools, I was still finding value with this approach. It's just getting better, as the company is investing in shared skills/tools/plugins/whatever the current terminology is that is specific to various use cases within the code base.
I haven't been writing so much code directly, but I do still very much feel that this is my code. My sessions are very interactive -- I ask the agent to explain decisions, question its plans, review the produced code and often revise it. I find it frees me up to spend more time thinking through and having higher level architecture applied instead of spending frustrating hours hunting down more basic "how does this work" information.
I think it might have been an article by Simon Willison that made the case for there being a way to use AI tooling to make you smarter, or to make you dumber. Point and shoot and blindly accept output makes you dumber -- it places more distance between you and your code base. Using AI tools to automate away a lot of the toil give you energy and time to dive deeper into your code base and develop a stronger mental model of how it works -- it makes you smarter. I keep in mind that at the end of the day, it's my name on the PR, regardless of how much Claude directly created or edited the files.
I haven't actively looked into it, but on a couple of occasions after google began inserting gemini results at the top of the list, I decided to try using some of the generated code samples when then search didn't turn up anything useful. The results were a mixed bag- the libraries that I'd been searching for examples from were not very broadly used and their interfaces volatile enough that in some cases the model was returning results for obsolete versions. Not a huge deal since the canonical docs had some recommendations. In at least a couple of cases though, the results included references to functions that had never been in the library at all, even though they sounded not only plausible but would have been useful if they did in fact exist.
In the end, I am generally using the search engine to find examples because I am too lazy to look at the source for the library I'm using, but if the choice is between an LLM that fabricates stuff some percentage of the time and just reading the fucking code like I've been doing for decades, I'd rather just take my chances with the search engine. If I'm unable to understand the code I'm reading enough to make it work, it's a good signal that maybe I shouldn't be using it at all since ultimately I'm going to be on the hook to straighten things out if stuff goes sideways.
Ultimately that's what this is all about- writing code is a big part of my career but the thing that has kept me employed is being able to figure out what to do when some code that I assembled (through some combination of experimentation, documentation, or duplication) is not behaving the way I had hoped. If I don't understand my own code chances are I'll have zero intuition about why it's not working correctly, and so the idea of introducing a bunch of random shit thrown together by some service which may or may not be able to explain it to me would be a disservice to my employers who trust me on the basis of my history of being careful.
I also just enjoy figuring shit out on my own.
From this thread, so far it seems:
Net negative for the ones who care and still need to work closely with others
Net positive for the ones who don't and/or are lone wolves
Maybe the future is lone wolves working on their thing without a care in the world. Accountable to no one but themselves. Bus factor dialed up to 11.
I feel bad saying this because so many folks have not had the best of luck, but it's changed the game for me.
I'm building out large multi-repo features in a 60 repo microservice system for my day job. The AI is very good at exploring all the repos and creating plans that cut across them to build the new feature or service. I've built out legacy features and also completely new web systems, and also done refactoring. Most things I make involve 6-8 repos. Everything goes through code review and QA. Code being created is not slop. High quality code and passes reviews as such. Any pushback I get goes back in to the docs and next time round those mistakes aren't made.
I did a demo of how I work in AI to the dev team at Math Academy who were complete skeptics before the call 2 hours later they were converts.
I’d be more curious to hear about the processes people have put in place for AI code reviews
On the one hand, past some threshold of criticality/complexity, you can’t push AI unreviewed, on the other, you can’t relegate your senior best engineers to do nothing but review code
It doesn’t just not scale, it makes their lives miserable
So then, what’s the best approach?
I think over time that threshold I mentioned will get higher and higher, but at the moment the ratio of code that needs to be reviewed to reviewers is a little high
It's a game changer for reading large codebases and debugging.
Error messages were the "slop" of the pre-LLM era. This is where an LLM shines, filling in the gaps where software engineering was neglected.
As for writing code, I don't let it generate anything that I couldn't have written myself, or anything that I can't keep in my brain at once. Otherwise I get really nervous about committing.
The job of a software engineer does and always has relied upon taking responsibility for the quality of one's work. Whether it's auto-complete or a fancier auto-complete, the responsibility should rest on your shoulders.
FAANG colleague writes this week -- "I am currently being eaten alive by AI stuff for my non-(foss-project) work. I spend most of my day slogging through AI generated comments and code trying to figure out what is good, not good, or needs my help to become good. Or I'm trying to figure out how to prompt the tools to do what I want them to do"
This fellow is one of the few mature software engineers I have ever met who is rigorously and consistently productive in a very challenging mature code base year in and year out. or WAS .. yes this is from coughgooglecough in California
It is a very mixed bag. I have enjoyed using opus 4.5 and 4.6 to add functionality to existing medium complexity codebases. It’s great for green field scripts and small POCs. I absolutely cannot stand reviewing the mostly insane PRs that other people generate with it.
I use Claude code for my research projects now, it’s incredible tbh. I’m not writing production code for millions of users I need to do data science stuff and write lots of code to do that and AI lets me focus on the parts of my research that I want to do and it makes me a lot more productive.
I'm repeating this 3rd time, but, a non-technical client of mine has whipped up an impressive SaaS prototype with tons of features. They still need help with the cleanup, it's all slop, but I was doing many small coding requests for that client. Those gigs will simply disappear.
I just got started using Claude very recently. I have not been in the loop how much better it got. Now it's obvious that no one will write code by hand. I genuinely fear for my ability to make a living as soon as 2 years from now, if not sooner. I figure the only way is to enter the red queen race and ship some good products. This is the positive I see. If I put 30h/week into something, I have productivity of 3 people. If it's a weekend project at 10h/week, I have now what used to be that full week of productivity. The economics of developing products solo have vastly changed for the better.
Things I’ve learned:
Claude Code is the best CLI tool by a mile.
Even at its best it’s wildly inconsistent from session to session. It does things differently every time. Sometimes I get impressed with how it works, then the next day, doing the exact same thing, and it flips out and goes nuts trying to do the same thing a totally different, unworkable way.
You can capture some of these issues in AGENTS.md files or the like, but there’s an endless future supply of them. And it’s even inconsistent about how it “remembers” things. Sometimes it puts in the project local config, sometimes in my personal overall memory files, sometimes instead of using its internal systems, it asks permission to search my home directory for its memory files.
The best way to use it is for throwaway scripts or examples of how to do something. Or new, small projects where you can get away with never reading the code. For anything larger or more important, its inconsistencies make it a net time loser, imo. Sure, let it write an annoying utility function for you, but don’t just let it loose on your code.
When you do use it for new projects, make it plan out its steps in advance. Provide it with a spec full of explicit usage examples of the functionality you want. It’s very literal, so expect it to overindex on your example cases and treat those as most important. Give it a list of specific libraries or tools you want it to use. Tell it to take your spec and plan out its steps in a separate file. Then tell it to implement those steps. That usually works to allow it to build something medium-complex in an hour or two.
When your context is filling up in a session in a particular project, tell it to review its CLAUDE.md file and make sure it matches the current state of the project. This will help the next session start smoothly.
One of the saddest things I’ve found is when a whole team of colleagues gets obsessed with making Claude figure something out. Once it’s in a bad loop, you need to start over, the context is probably poisoned.
I've been working on a client server unity based game the last couple of years. It's pretty bad at handling that use case. It misses tons of corner cases that span the client server divide.