What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects. If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them. The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.
No doubt I could give Opus 4.5 "build be a XYZ app" and it will do well. But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". Any non-technical person might read that and go "if it works it works" but any reasonable engineer will know that thats not enough.
There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things.
I've worked on teams where multiple engineers argued about the "right" way to build something. I remember thinking that they had biases based on past experiences and assumptions about what mattered. It usually took an outsider to proactively remind them what actually mattered to the business case.
I remember cases where a team of engineers built something the "right" way but it turned out to be the wrong thing. (Well engineered thing no one ever used)
Sometimes hacking something together messily to confirm it's the right thing to be building is the right way. Then making sure it's secure, then finally paying down some technical debt to make it more maintainable and extensible.
Where I see real silly problems is when engineers over-engineer from the start before it's clear they are building the right thing, or when management never lets them clean up the code base to make it maintainable or extensible when it's clear it is the right thing.
There's always a balance/tension, but it's when things go too far one way or another that I see avoidable failures.
Anecdata but I’ve found Claude code with Opus 4.5 able to do many of my real tickets in real mid and large codebases at a large public startup. I’m at senior level (15+ years). It can browse and figure out the existing patterns better than some engineers on my team. It used a few rare features in the codebase that even I had forgotten about and was about to duplicate. To me it feels like a real step change from the previous models I’ve used which I found at best useless. It’s following style guides and existing patterns well, not just greenfield. Kind of impressive, kind of scary
Another thing that gets me with projects like this, there are already many examples of image converters, minesweeper clones etc that you can just fork on GitHub, the value of the LLM here is largely just stripping the copyright off
Same exist in humans also, I worked with a developer who had 15 year experience and was tech lead in a big Indian firm, We started something together, 3 months back when I checked the Tables I was shocked to see how he fucked up and messed the DB. Finally the only option left with me was to quit because i know it will break in production and if i onboarded a single customer my life would be screwed. He mixed many things with frontend and offloaded even permissions to frontend, and literally copied tables in multiple DB (We had 3 services). I still cannot believe how he worked as a tch lead for 15 years. each DB had more than 100 tables and out of that 20-25 were duplicates. He never shared code with me, but I smelled something fishy when bug fixing was never ending loop and my front end guy told me he cannot do it anymore. Only mistake I did was I trusted him and worst part is he is my cousin and the relation became sour after i confronted him and decided to quit.
> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.
You’re talking like in the year 2026 we’re still writing code for future humans to understand and improve.
I fear we are not doing that. Right now, Opus 4.5 is writing code that later Opus 5.0 will refactor and extend. And so on.
A greenfield project is definitely 'easy mode' for an LLM; especially if the problem area is well understood (and documented).
Opus is great and definitely speeds up development even in larger code bases and is reasonably good at matching coding style/standard to that of of the existing code base.
In my opinion, the big issue is the relatively small context that quickly overwhelms the models when given a larger task on a large codebase.
For example, I have a largish enterprise grade code base with nice enterprise grade OO patterns and class hierarchies. There was a simple tech debt item that required refactoring about 30-40 classes to adhere to a slightly different class hierarchy. The work is not difficult, just tedious, especially as unit tests need to be fixed up.
I threw Opus at it with very precise instructions as to what I wanted it to do and how I wanted it to do it. It started off well but then disintegrated once it got overwhelmed at the sheer number of files it had to change. At some point it got stuck in some kind of an error loop where one change it made contradicted with another change and it just couldn't work itself out. I tried stopping it and helping it out but at this point the context was so polluted that it just couldn't see a way out. I'd say that once an LLM can handle more 'context' than a senior dev with good knowledge of a large codebase, LLM will be viable in a whole new realm of development tasks on existing code bases. That 'too hard to refactor this/make this work with that' task will suddenly become viable.
One thing I've been tossing around in my head is:
- How quickly is cost of refactor to a new pattern with functional parity going down?
- How does that change the calculus around tech debt?
If engineering uses 3 different abstractions in inconsistent ways that leak implementation details across components and duplicate functionality in ways that are very hard to reason about, that is, in conventional terms, an existential problem that might kill the entire business, as all dev time will end up consumed by bug fixes and dealing with pointless complexity, velocity will fall to nothing, and the company will stop being able to iterate.
But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. And it has changed more if you are willing to bet that models within a year will be much better at such tasks.
And in my experience, claude is imperfect at refactors and still requires review and a lot of steering, but it's one of the things it's better at, because it has clear requirements and testing workflows already built to work with around the existing behavior. Refactoring is definitely a hell of a lot faster than it used to be, at least on the few I've dealt with recently.
In my mind it might be kind of like thinking about financial debt in a world with high inflation, in that the debt seems like it might get cheaper over time rather than more expensive.
> If all an engineer did all day was build apps from scratch, with no expectation that others may come along and extend, build on top of, or depend on, then sure, Opus 4.5 could replace them.
Why do they need to be replaced? Programmers are in the perfect place to use AI coding tools productively. It makes them more valuable.
I had Opus write a whole app for me in 30 seconds the other night. I use a very extensive AGENTS.md to guide AI in how I like my code chiseled. I've been happily running the app without looking at a line of it, but I was discussing the app with someone today, so I popped the code open to see what it looked like. Perfect. 10/10 in every way. I would not have written it that good. It came up with at least one idea I would not have thought of.
I'm very lucky that I rarely have to deal with other devs and I'm writing a lot of code from scratch using whatever is the latest version of the frameworks. I understand that gives me a lot of privileges others don't have.
Their thesis is that code quality does not matter as it is now a cheap commodity. As long as it passes the tests today it's great. If we need to refactor the whole goddamn app tomorrow, no problem, we will just pay up the credits and do it in a few hours.
>What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects
They get those ocassionally all the time though too. Depends on the company. In some software houses it's constant "greenfield projects", one after another. And even in companies with 1-2 pieces of main established software to maintain, there are all kinds of smaller utilities or pipelines needed.
>But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right".
In some cases that's legit. In other cases it's just "it did it well, but not how I'd done it", which is often needless stickness to some particular style (often a contention between 2 human programmers too).
Basically, what FloorEgg says in this thread: "There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things."
And you can always not just tell it "build me this feature", but tell it (high level way) how to do it, and give it a generic context about such preferences too.
Even if you are going green field, you need to build it the way it is likely to be used based a having a deep familiarity with what that customer's problems are and how their current workflow is done. As much as we imagine everything is on the internet, a bunch of this stuff is not documented anywhere. An LLM could ask the customer requirement questions but that familiarity is often needed to know the right questions to ask. It is hard to bootstrap.
Even if it could build the perfect greenfield app, as it updates the app it is needs to consider backwards compatibility and breaking changes. LLMs seem very far as growing apps. I think this is because LLMs are trained on the final outcome of the engineering process, but not on the incremental sub-commit work of first getting a faked out outline of the code running and then slowly building up that code until you have something that works.
This isn't to say that LLMs or other AI approaches couldn't replace software engineering some day, but they clear aren't good enough yet and the training sets they have currently have access to are unlikely to provide the needed examples.
Yeah. Just like another engineer. When you tell another engineer to build you a feature, it's improbable they'll do it they way that you consider "right."
This sounds a lot like the old arguments around using compilers vs hand-writing asm. But now you can tell the LLM how you want to implement the changes you want. This will become more and more relevant as we try to maintain the code it generates.
But, for right now, another thing Claude's great at is answering questions about the codebase. It'll do the analysis and bring up reports for you. You can use that information to guide the instructions for changes, or just to help you be more productive.
> its building it the right way, in an easily understood way, in a way that's easily extensible.
When I worked at Google, people rarely got promoted for doing that. They got promoted for delivering features or sometimes from rescuing a failing project because everyone was doing the former until promotion velocity dropped and your good people left to other projects not yet bogged down too far.
You can look at my comment history to see the evidence to how hostile I was to agentic coding. Opus 4.5 completely changed my opinion.
This thing jumped into a giant JSF (yes, JSF) codebase and started fixing things with nearly zero guidance.
After recently applying Codex to a gigantic old and hairy project that is as far from greenfield it can be, I can assure you this assertion is false. It’s bonkers seeing 5.2 churn though the complexity and understanding dependencies that would take me days or weeks to wrap my head around.
In my personal experience, Claude is better at greenfield, Codex is better at fitting in. Claude is the perfect tool for a "vibe coder", Codex is for the serious engineer who wants to get great and real work done.
Codex will regularly give me 1000+ line diffs where all my comments (I review every single line of what agents write) are basically nitpicks. "Make this shallow w/ early return, use | None instead of Optional", that sort of thing.
I do prompt it in detail though. It feels like I'm the person coming in with the architecture most of the time, AI "draws the rest of the owl."
My favorite benchmark for LLMs and agents is to have it port a medium-complexity library to another programming language. If it can do that well, it's pretty capable of doing real tasks. So far, I always have to spend a lot of time fixing errors. There are also often deep issues that aren't obvious until you start using it.
Exactly. The main issue IMO is that "software that seems to work" and "software that works" can be very hard to tell apart without validating the code, yet these are drastically different in terms of long-term outcomes. Especially when there's a lot of money, or even lives, riding on these outcomes. Just because LLMs can write software to run the Therac-25 doesn't mean it's acceptable for them to do so.
Your hobby project, though, knock yourself out.
Another thing these posts assume is a single developer keep working on the product with a number of AI agents, not a large team. I think we need to rethink how teams work with AI. Its probably not gonna be a single developer typing a prompt but a team somehow collaborates a prompt or equivalent. XP on steroids? Programming by committee?
I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.
On the contrary, Opus 4.5 is the best agent I’ve ever used for making cohesive changes across many files in a large, existing codebase. It maintains our patterns and looks like all the other code. Sometimes it hiccups for sure.
But... you can ask! Ask claude to use encapsulation, or to write the equivalent of interfaces in the language you using, and to map out dependencies and duplicate features, or to maintain a dictionary of component responsibilities.
AI coding is a multiplier of writing speed but doesn't excuse planning out and mapping out features.
You can have reasonably engineered code if you get models to stick to well designed modules but you need to tell them.
I totally agree. And welcome to disposable software age.
> greenfield
LLMs are pretty good at picking up existing codebases. Even with cleared context they can do „look at this codebase and this spec doc that created it. I want to add feature x“
Yeah, all of those applications he shows do not really expose any complex business logic.
With all the due respect: a file converter for windows is glueing few windows APIs with the relevant codec.
Now, good luck working on a complex warehouse management application where you need extremely complex logic to sort the order of picking, assembling, packing on an infinite number of variables: weight, amazon prime priority, distribution centers, number and type of carts available, number and type of assembly stations available, different delivery systems and requirements for different delivery operators (such as GLE, DHL, etc) that has to work with N customers all requiring slightly different capabilities and flows, all having different printers and operations, etc, etc. And I ain't even scratching the surface of the business logic complexity (not even mentioning functional requirements) to avoid boring the reader.
Mind you, AI is still tremendously useful in the analysis phase, and can sort of help in some steps of the implementation one, but the number of times you can avoid looking thoroughly at the code for any minor issue or discrepancy is absolutely close to 0.
It just one shots bug fixes in complex codebases.
Copy-paste the bug report and watch it go.
It might scale.
So far, Im not convinced, but lets take a look at fundmentally whats happening and why humans > agents > LLMs.
At its heart, programming is a constraint satisfaction problem.
The more constraints (requirements, syntax, standards, etc) you have, the harder it is to solve them all simultaneously.
New projects with few contributors have fewer constraints.
The process of “any change” is therefore simpler.
Now, undeniably
1) agents have improved the ability to solve constraints by iterating; eg. Generate, test, modify, etc. over raw LLm output.
2) There is an upper bound (context size, model capability) to solve simultaneous constraints.
3) Most people have a better ability to do this than agents (including claude code using opus 4.5).
So, if youre seeing good results from agents, you probably have a smaller set of constraints than other people.
Similarly, if youre getting bad results, you can probably improve them by relaxing some of the constraints (consistent ui, number of contributors, requirements, standards, security requirements, split code into well defined packages).
This will make both agents and humans more productive.
The open question is: will models continue to improve enough to approach or exceed human level ability in this?
Are humans willing to relax the constraints enough for it to be plausible?
I would say currently people clambering about the end of human developers are cluelessly deceived by the “appearance of complexity” which does not match the “reality of constraints” in larger applications.
Opus 4.5 cannot do the work of a human on code bases Ive worked on. Hell, talented humans struggle to work on some of them.
…but that doesnt mean it doesnt work.
Just that, right now, the constraint set it can solve is not large enough to be useful in those situations.
…and increasingly we see low quality software where people care only about speed of delivery; again, lowering the bar in terms of requirements.
So… you know. Watch this space. Im not counting on having a dev job in 10 years. If I do, it might be making a pile of barely working garbage.
…but I have one now, and anyone who thinks that this year people will be largely replaced by AI is probably poorly informed and has misunderstood the capabilities on these models.
Theres only so low you can go in terms of quality.
Based on my experience using these LLMs regularly I strongly doubt it could even build any application with realistic complexity without screwing things up in major ways everywhere, and even on top of that still not meeting all the requirements.
If you have microservices architecture in your project you are set for AI. You can swap out any lacking, legacy microservice in your system with "greenfield" vibecoded one.
Man, I've been biting my tongue all day with regards to this thread and overall discussion.
I've been building a somewhat-novel, complex, greenfield desktop app for 6 months now, conceived and architected by a human (me), visually designed by a human (me), implementation heavily leaning on mostly Claude Code but with Codex and Gemini thrown in the mix for the grunt work. I have decades of experience, could have built it bespoke in like 1-2 years probably, but I wanted a real project to kick the tires on "the future of our profession".
TL;DR I started with 100% vibe code simply to test the limits of what was being promised. It was a functional toy that had a lot of problems. I started over and tried a CLI version. It needed a therapist. I started over and went back to visual UI. It worked but was too constrained. I started over again. After about 10 complete start-overs in blank folders, I had a better vision of what I wanted to make, and how to achieve it. Since then, I've been working day after day, screen after screen, building, refactoring, going feature by feature, bug after bug, exactly how I would if I was coding manually. Many times I've reached a point where it feels "feature complete", until I throw a bigger dataset at it, which brings it to its knees. Time to re-architect, re-think memory and storage and algorithms and libraries used. Code bloated, and I put it on a diet until it was trim and svelte. I've tried many different approaches to hard problems, some of which LLMs would suggest that truly surprised me in their efficacy, but only after I presented the issues with the previous implementation. There's a lot of conversation and back and forth with the machine, but we always end up getting there in the end. Opus 4.5 has been significantly better than previous Anthropic models. As I hit milestones, I manually audit code, rewrite things, reformat things, generally polish the turd.
I tell this story only because I'm 95% there to a real, legitimate product, with 90% of the way to go still. It's been half a year.
Vibe coding a simple app that you just want to use personally is cool; let the machine do it all, don't worry about under the hood, and I think a lot of people will be doing that kind of stuff more and more because it's so empowering and immediate.
Using these tools is also neat and amazing because they're a force multiplier for a single person or small group who really understand what needs done and what decisions need made.
These tools can build very complex, maintainable software if you can walk with them step by step and articulate the guidelines and guardrails, testing every feature, pushing back when it gets it wrong, growing with the codebase, getting in there manually whenever and wherever needed.
These tools CANNOT one-shot truly new stuff, but they can be slowly cajoled and massaged into eventually getting you to where you want to go; like, hard things are hard, and things that take time don't get done for a while. I have no moral compunctions or philosophical musings about utilizing these tools, but IMO there's still significant effort and coordination needed to make something really great using them (and literally minimal effort and no coordination needed to make something passable)
If you're solo, know what you want, and know what you're doing, I believe you might see 2x, 4x gains in time and efficiency using Claude Code and all of his magical agents, but if your project is more than a toy, I would bet that 2x or 4x is applied to a temporal period of years, not days or months!
>day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right"
Then don't ask it to "build me this feature" instead lay out a software development process with designated human in the loop where you want it and guard rails to keep it on track. Create a code review agent to look for and reject strange abstractions. Tell it what you don't like and it's really good at finding it.
I find Opus 4.5, properly prompted, to be significantly better at reviewing code than writing it, but you can just put it in a loop until the code it writes matches the review.
> The hard thing about engineering is not "building a thing that works", its building it the right way, in an easily understood way, in a way that's easily extensible.
The number of production applications that achieve this rounds to zero
I’ve probably managed 300 brownfield web, mobile, edge, datacenter, data processing and ML applications/products across DoD, B2B, consumer and literally zero of them were built in this way
This is the exact copium I came here to enjoy.
you can definitely just tell it what abstractions you want when adding a feature and do incremental work on existing codebase. but i generally prefer gpt-5.2
"its building it the right way, in an easily understood way, in a way that's easily extensible"
I am in a unique situation where I work with a variety of codebases over the week. I have had no problem at all utilizing Claude Code w/ Opus 4.5 and Gemini CLI w/ Gemini 3.0 Pro to make excellent code that is indisputably "the right way", in an extremely clear and understandable way, and that is maximally extensible. None of them are greenfield projects.
I feel like this is a bit of je ne sais quoi where people appeal to some indemonstrable essence that these tools just can't accomplish, and only the "non-technical" people are foolish enough to not realize it. I'm a pretty technical person (about 30 years of software development, up to staff engineer and then VP). I think they have reached a pretty high level of competence. I still audit the code and monitor their creations, but I don't think they're the oft claimed "junior developer" replacement, but instead do the work I would have gotten from a very experienced, expert-level developer, but instead of being an expert at a niche, they're experts at almost every niche.
Are they perfect? Far from it. It still requires a practitioner who knows what they're doing. But frequently on here I see people giving takes that sound like they last used some early variant of Copilot or something and think that remains state of the art. The rest of us are just accelerating our lives with these tools, knowing that pretending they suck online won't slow their ascent an iota.
Not necessarily responding to you directly, but I find this take to be interesting, and I see it every time an article like this makes the rounds.
Starting back in 2022/2023:
- (~2022) It can auto-complete one line, but it can't write a full function.
- (~2023) Ok, it can write a full function, but it can't write a full feature.
- (~2024) Ok, it can write a full feature, but it can't write a simple application.
- (~2025) Ok, it can write a simple application, but it can't create a full application that is actually a valuable product.
- (~2025+) Ok, it can write a full application that is actually a valuable product, but it can't create a long-lived complex codebase for a product that is extensible and scalable over the long term.
It's pretty clear to me where this is going. The only question is how long it takes to get there.