Judging from all the comments here, it’s going to be amazing seeing the fallout of all the LLM generated code in a year or so. The amount of people who seemingly relish the ability to stop thinking and let the model generate giant chunks of their code base, is uh, something else lol.
I disagree from almost the first sentence:
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
Learning how to use LLMs in a coding workflow is trivial to start, but you find you get a bad taste early if you don't learn how to adapt both your workflow and its workflow. It is easy to get a trivially good result and then be disappointed in the followup. It is easy to try to start on something it's not good at and think it's worthless.
The pure dismissal of cursor, for example, means that the author didn't learn how to work with it. Now, it's certainly limited and some people just prefer Claude code. I'm not saying that's unfair. However, it requires a process adaptation.
LLM’s are basically glorified slot machines. Some people try very hard to come up with techniques or theories about when the slot machine is hot, it’s only an illusion, let me tell you, it’s random and arbitrary, maybe today is your lucky day maybe not. Same with AI, learning the “skill” is as difficult as learning how to google or how to check stackoverflow, trivial. All the rest is luck and how many coins do you have in your pocket.
Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. [...]
LLMs will always suck at writing code that has not be written millions of times before. As soon as you venture slightly offroad, they falter.
That right there is your learning curve! Getting LLMs to write code that's not heavily represented in their training data takes experience and skill and isn't obvious to learn.
LLM driven coding can yield awesome results, but you will be typing a lot and, as article states, requires already well structured codebase.
I recently started with fresh project, and until I got to the desired structure I only used AI to ask questions or suggestions. I organized and written most of the code.
Once it started to get into the shape that felt semi-permanent to me, I started a lot of queries like:
```
- Look at existing service X at folder services/x
- see how I deploy the service using k8s/services/x
- see how the docker file for service X looks like at services/x/Dockerfile
- now, I started service Y that does [this and that]
- create all that is needed for service Y to be skaffolded and deployed, follow the same pattern as service X
```
And it would go, read existing stuff for X, then generate all of the deployment/monitoring/readme/docker/k8s/helm/skaffold for Y
With zero to none mistakes. Both claude and gemini are more than capable to do such task. I had both of them generate 10-15 files with no errors, with code being able to be deployed right after (of course service will just answer and not do much more than that)
Then, I will take over again for a bit, do some business logic specific to Y, then again leverage AI to fill in missing bits, review, suggest stuff etc.
It might look slow, but it actually cuts most boring and most error prone steps when developing medium to large k8s backed project.
Deeply curious to know if this is an outlier opinion, a mainstream but pessimistic one, or the general consensus. My LinkedIn feed and personal network certainly suggests that it's an outlier, but I wonder if the people around me are overly optimistic or out of synch with what the HN community is experiencing more broadly.
People that comment on and get defensive about this bit:
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
How much of your workflow or intuition from 6 months ago is still relevant today? How long would it take to learn the relevant bits today?
Keep in mind that Claude Code was released less than 6 months ago.
Have built many pipelines integrating LLMs to drive real $ results. I think this article boils it down too simply. But i always remember, if the LLM is the most interesting part of your work, something is severely wrong and you probably aren’t adding much value. Context management based on some aspects of your input is where LLMs get good, but you need to do lots of experimentation to tune something. Most cases i have seen are about developing one pipeline to fit 100s of extremely different cases; LLM does not solve this problem but basically serves as an approximator for you to discretize previously large problems in to some information sub space where you can treat the infinite set of inputs as something you know. LLMs are like a lasso (and a better/worse one than traditional lassos depending on use case) but once you get your catch you still need to process it, deal with it progammatically to solve some greater problem. I hate how so many LLM related articles/comments say “ai is useless throw it away dont use it” or “ai is the future if we dont do it now we’re doomed lets integrate it everywhere it can solve all our problems” like can anyone pick a happy medium? Maybe thats what being in a bubble looks like
So many articles should prepend “My experience with ...” to their title. Here is OP's first sentence: “I spent the past ~4 weeks trying out all the new and fancy AI tools for software development.” Dude, you have had some experiences and they are worth writing up and sharing. But your experiences are not a stand-in for "the current state." This point applies to a significant fraction of HN articles, to the point that I wish the headlines were flagged “blog”.
>I made a CLI logs viewers and querier for my job, which is very useful but would have taken me a few days to write (~3k LoC)
I recall The Mythical Man-Month stating a rough calculation that the average software developer writes about 10 net lines of new, production-ready code per day. For a tool like this going up an order of magnitude to about 100 lines of pretty good internal tooling seems reasonable.
OP sounds a few cuts above the 'average' software developer in terms of skill level. But here we also need to point out a CLI log viewer and querier is not the kind of thing you actually needed to be a top tier developer to crank out even in the pre-LLM era, unless you were going for lnav [1] levels of polish.
[1]: https://lnav.org/
Interesting read, but strange to totally ignore the macOS ChatGPT app that optionally integrates with a terminal session, the currently opened VSCode editor tab, XCode. etc. I use this combination at least 2 or 3 times a month, and even if my monthly use is less that 40 minutes total, it is a really good tool to have in your toolbelt.
The other thing I disagree with is the coverage of gemnini-cli: if you use gemini-cli for a single long work session, then you must set your Google API key as an environment variable when starting gemini-cli, otherwise you end up after a short while using Gemini-2.5-flash, and that leads to unhappy results. So, use gemini-cli for free for short and focused 3 or 4 minute work sessions and you are good, or pay for longer work sessions, and you are good.
I do have a random off topic comment: I just don’t get it: why do people live all day in an LLM-infused coding environment? LLM based tooling is great, but I view it as something I reach for a few times a day for coding and that feels just right. Separately, for non-coding tasks, reaching for LLM chat environments for research and brainstorming is helpful, but who really needs to do that more than once or twice a day?
I think that beyond the language used, the article does have some points I agree with. In general, LLMs code better in languages that are more easily available online, where they can be trained on a larger amount of source code. Python is not the same as PL/I (I don't know if you've tried it, but with the latter, they don't know the most basic conventions used in its development).
When it is mentioned that LLMs "have terrible code organization skills", I think they are referring mainly to the size of the context. It is not the same to develop a module with hundreds of LoCs, one with thousands or one with tens of thousands of LoCs.
I am not very much in favor of skill degradation; I am not aware of a study that validates it in this regard. On the other hand, it is true that agents are constantly evolving, and I don't see any difficulties that cannot be overcome with the current evolutionary race, given that, in the end, coding is one of the most accessible functions for artificial intelligence.
Opening the essay with ~~Learning how to use LLMs in a coding workflow is trivial.~~ and closing with suggestion ~~ Copilot ~~ for AI agent is the worst take of LLM coding I ever saw
I think we're still in the gray zone of the "Incessant Obsolescence Postulate" (the Wait Calculation). Are you better off "skilling up" on the tech as it is today, or waiting for it to just "get better" so by the time you kick off, you benefit from the solved-problems X years from now. I also think this calculation differs by domain, skill level, and your "soft skill" abilities to communicate, explain and teach. In some domains, if you're not already on this train, you won't even get hired anymore.
The current state of LLM-driven development is already several steps down the path of an end-game where the overwhelming majority of code is written by the machine; our entire HCI for "building" is going to be so far different to how we do it now that we'll look back at the "hand-rolling code era" in a similar way to how we view programming by punch-cards today. The failure modes, the "but it SUCKS for my domain", the "it's a slot machine" etc etc are not-even-wrong. They're intermediate states except where they're not.
The exceptions to this end-game will be legion and exist only to prove the end-game rule.
> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.
Do they? I’ve found Clojure-MCP[1] to be very useful. OTOH, I’m not attempting to replace myself, only augment myself.
OP did miss the vscode extension for claude code, it is still terminal based but: - it show you the diff of the incoming changes in vscode ( like git ) - it know the line you selected in the editor for context
Good read. I just want to pinpoint that LLMs seems to write better React code, but as an experienced frontend developers my opinion is that it's also bad at React. Its approach is outdated as it doesn't follow the latest guidelines. It writes React as I would have written it in 2020. So as usual, you need to feed the right context to get proper results.
Relying on LLM for any skill, especially programming, is like cutting your own healthy legs and buying crutches to walk. Plus you now have to pay $49/month for basic walking ability and $99/month for "Walk+" plan, where you can also (clumsily) jog.
I find all AI coding goes something like this algorithm
* I let the AI do something
* I find bad bug or horrifying code
* I realize I have it too much slack
* hand code for a while
* go back to narrow prompts
* get lazy, review code a bit less add more complexity
* GOTO 1, hopefully with a better instinct for where/how to trust this model
Then over time you hone your instinct on what to delegate and what to handle yourself. And how deeply to pay attention.
I would actually disagree with the final conclusion here; despite claiming to offer the same models, Copilot seems very much nerfed — cross-comparing the Copilotified LLM and the same LLM through OpenRouter, the Copilot one seems to fail much harder. I'm not an expert in the details of LLMs but I guess there might be some extra system prompt, I also notice the context window limit is much lower, which kinda suggests it's been partially pre-consumed.
In case it matters, I was using Copilot that is for 'free' because my dayjob is open source, and the model was Claude Sonnet 3.7. I've not yet heard anyone else saying the same as me which is kind of peculiar.
I have not tried every IDE/CLI or models, only a few, mostly Claude and Qwen.
I work mostly in C/C++.
The most valuable improvement of using this kind of tools, for me, is to easily find help when I have to work on boring/tedious tasks or when I want to have a Socratic conversation about a design idea with a not-so-smart but extremely knowledgeable colleague.
But for anything requiring a brain, it is almost useless.
Does not mention the actual open source solution that has autocomplete, chat, planer and agents, lets you bring your own keys, connect to any llm provider, customize anything, rewrite all the prompts and tools.
> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.
I haven't found that to be true with my most recent usage of AI. I do a lot of programming in D, which is not popular like Python or Javascript, but Copilot knows it well enough to help me with things like templates, metaprogramming, and interoperating with GCC-produced DLL's on Windows. This is true in spite of the lack of a big pile of training data for these tasks. Importantly, it gets just enough things wrong when I ask it to write code for me that I have to understand everything well enough to debug it.
This article makes me wanna try building a token field in Flutter using a LLM chat or agent. Chat should be enough. A few iterations to get the behaviour and the tests right. A bit of style to make it look Apple-nice. As if a regular dev would do much better/quicker for this use case, such a bad example imo I don't buy it
I have a biased opinion since I work for a background agent startup currently - but there are more (and better!) out there than Jules and Copilot that might address some of the author's issues.
Strange post. It reads in part like an incoherent rant and in part as a well made analysis.
It’s mostly on point though. Although, in recent years I’ve been assigned to manage and plan projects at work, and the skills I’ve learnt from that greatly help to get effective results from an LLM I think.
The following is half serious. Please enjoy.
Some comments here are reminiscent of antiquated discourse: "how many angels dance on the head of a pin?"
We somehow are trying to agree on some factual ramp-up time required for a dev to become competent coding with LLM's. This is inherently subjective! Why bother?
Perhaps certain LLMs are blessed with disproportionately more angels (nee "bugs") in the machines.
I enjoyed reading the article:
"The model looks good, but Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there."
Yikes!
Credit to the author for having the courage to post publically.
"LLMs won’t magically make you deliver production-ready code"
Either I'm extremely lucky or I was lucky to find the guy who said it must all be test driven and guided by the usual principles of DRY etc. Claude Code works absolutely fantastically nine out of 10 times and when it doesn't we just roll back the three hours of nonsense it did postpone this feature or give it extra guidance.
> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.
Almost like hiring and scaling a team? There are also benchmarks that specifically measure this, and its in theory a very temporary problem (Aider Polyglot Benchmark is one such).
There are kind of a lot of errors in this piece. For instance, the problem the author had with Gemini CLI running out of tokens in ten minutes is what happens when you don’t set up (a free) API key in your environment.
My favorite setup so far is using the Claude code extension in VScode. All the power of CC, but it opens files and diffs in VScode. Easy to read and modify as needed.
I agree. I had a similar experience.
There’s an IntelliJ extension for GitHub CoPilot.
It’s not perfect but it’s okay.
> By being particularly bad at anything outside of the most popular languages and frameworks, LLMs force you to pick a very mainstream stack if you want to be efficient.
I use clojure for my day-to-day work, and I haven't found this to be true. Opus and GPT-5 are great friends when you start pushing limits on Clojure and the JVM.
> Or 4.1 Opus if you are a millionaire and want to pollute as much possible
I know this was written tongue-in-cheek, but at least in my opinion it's worth it to use the best model if you can. Opus is definitely better on harder programming problems.
> GPT 4.1 and 5 are mostly bad, but are very good at following strict guidelines.
This was interesting. At least in my experience GPT-5 seemed about as good as Opus. I found it to be _less_ good at following strict guidelines though. In one test Opus avoided a bug by strictly following the rules, while GPT-5 missed.
> Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
I'm sprry, but I disagree with this claim. That is not my experience, nor many others. It's true that you can make them do something without learning anything. However, it takes time to learn what they are good amd bad at, what information they need, and what nonsense they'll do without express guidance. It also takes time to know what to look for when reviewing results.
I also find that they work fine for languages without static types. You need need tests, yes, but you need them anyway.
Personally, I’ve had a pretty positive experience with the coding assistants, but I had to spend some time to develop intuition for the types of tasks they’re likely to do well. I would not say that this was trivial to do.
Like if you need to crap out a UI based on a JSON payload, make a service call, add a server endpoint, LLMs will typically do this correctly in one shot. These are common operations that are easily extrapolated from their training data. Where they tend to fail are tasks like business logic which have specific requirements that aren’t easily generalized.
I’ve also found that writing the scaffolding for the code yourself really helps focus the agent. I’ll typically add stubs for the functions I want, and create overall code structure, then have the agent fill the blanks. I’ve found this is a really effective approach for preventing the agent from going off into the weeds.
I also find that if it doesn’t get things right on the first shot, the chances are it’s not going to fix the underlying problems. It tends to just add kludges on top to address the problems you tell it about. If it didn’t get it mostly right at the start, then it’s better to just do it yourself.
All that said, I find enjoyment is an important aspect as well and shouldn’t be dismissed. If you’re less productive, but you enjoy the process more, then I see that as a net positive. If all LLMs accomplish is to make development more fun, that’s a good thing.
I also find that there's use for both terminal based tools and IDEs. The terminal REPL is great for initially sketching things out, but IDE based tooling makes it much easier to apply selective changes exactly where you want.
As a side note, got curious and asked GLM-4.5 to make a token field widget with React, and it did it in one shot.
It's also strange not to mention DeepSeek and GLM as options given that they cost orders of magnitude less per token than Claude or Gemini.
"If an(y) LLM could operate on your codebase without much critical issues, then your architecture is sound" - revskill
"Google’s enshittification has won and it looks like no competent software developers are left. I would know, many of my friends work there". Ouch ... I hope his friends are in marketing!
They missed OpenAI Codex, maybe deliberately? It's less llm-development and more vibe-coding, or maybe "being a PHB of robots". I'm enjoying it for my side project this week.
> Claude 4 Sonnet > Or 4.1 Opus if you are a millionaire and want to pollute as much possible
That was an unnecessary guilt-shaming remark.
Yet another developer who is too full of themselves to admit that they have no idea how to use LLMs for development. There's an arrogance that can set in when you get to be more senior and unless you're capable of force feeding yourself a bit of humility you'll end up missing big, important changes in your field.
It becomes farcical when not only are you missing the big thing but you're also proud of your ignorance and this guy is both.
It's all about the Kilo Code extension.
Learning how to use LLMs in a coding workflow is trivial. There is no learning curve. You can safely ignore them if they don’t fit your workflows at the moment.
I have never heard anybody successfully using LLMs say this before. Most of what I've learned from talking to people about their workflows is counterintuitive and subtle.
It's a really weird way to open up an article concluding that LLMs make one a worse programmer: "I definitely know how to use this tool optimally, and I conclude the tool sucks". Ok then. Also: the piano is a terrible, awful instrument; what a racket it makes.