I don't understand the productivity that people get out of these AI tools. I've tried it and I just can't get anything remotely worthwhile unless it's something very simple or something completely new being built from the ground up.
Like sure, I can ask claude to give me the barebones of a web service that does some simple task. Or a webpage with some information on it.
But any time I've tried to get AI services to help with bugfixing/feature development on a large, complex, potentially multi-language codebase, it's useless.
And those tasks are the ones that actually take up the majority of my time. On the occasion that I'm spinning a new thing up quickly, I don't really need an AI to do it for me -- I mean, that's the easy part!
Is there something I'm missing? Am I just not using it right? I keep seeing people talk about how addictive it is, how the productivity boost is insane, how all their code is now written by AI and then audited, and I just don't see how that's possible outside of really simple rote programming.
I wish more had been written about the first assertion that using an LLM to code is like gambling and you're always hoping that just one more prompt will get you what you want.
It really captures how little control one has over the process, while simultaneously having the illusion of control.
I don't really believe that code is being made verbose to make more profits. There's probably some element of model providers not prioritizing concise code, but if conciseness while maintaining "quality" was possible is would give one model a sufficient edge over others that I suspect providers would do it.
These perverse incentives run at the heart of almost all Developer Software as a Service tooling. Using someone else's hosted model incentivizes increasing token usage, but it's nothing special about AI.
Consider Database-as-a-service companies: They're not incentivized to optimize on CPU usage, they charge per cpu. They're not incentivized to improve disk compression, they charge for disk-usage. There are several DB vendors who explicitly disable disk compression and happily charge for storage capacity.
When you run the software yourself, or the model yourself, the incentives aligned: use less power, use less memory, use less disk, etc.
At best, the only useful thing I can get from chat gpt, deepSeek, or grok is keywords I can search on Google to find a valid solution to my problem. I get so frustrated with them that I almost never use LLMs, except for fixing grammar or translating. It's not because I'm against them, but because they are useless to me and a massive waste of time.
I feel like "vibe coding" as a "no look" sort of way to produce anything is bad and will probably remain bad for some time.
However... "vibe architecting" is likely going to be the way forward. I have had success with generating/tuning an architecture plan with AI, having it create stub files/functions then filling them out individually. I can get pretty much the whole way without typing code, but it does require a fair bit more architectural thinking than usual and a good bit of reading code (then telling the AI to "do better").
I think of it like the analogy of blind men describing an elephant when they can only feel a single part. AI is decent at high level architecture and decent at low level production but you need a human to understand the big picture and how the pieces fit (and which ones are missing).
Amusingly, about 90% of my rat's-nest problems with Sonnet 3.7 are solved by simply appending a few words to the end of the prompt:
"write minimum code required"
It's not even that sensitive to the wording - "be terse" or "make minimal changes" amount to the same thing - but the resulting code will often be at least 50% shorter than the un-guided version.
1. Yes. I've spent several late nights nudging Cline and Claude (and other systems) to the right answers. And being able to use AWS Bedrock to do this has been great (note: I work at Amazon).
2. I've had good fortunes keeping the agents to constrained areas, working on functions, or objects, with clearly defined (by me) boundaries. If the measure of a junior engineer is that you correct them once a day, an engineer once a week, a senior once a month, a principal once a quarter... Treat these agents like hyper-energetic interns. Nudge frequently.
3. Standard org management coding practices apply. Force the agents to show work, plan, unit test, investigate.
And, basically, I've described that we're becoming Software Development Managers with teams of on-demand low-quality interns. That's an incredibly powerful tool, but don't expect hyper-elegant and compact code from them. Keep that for the senior engineering staff (humans) for now.
(Note: The AlphaEvolve announcement makes me wonder if I'm going to have hyper-energetic applied science interns next...)
This addiction and fear of things-going-bad-if-i-dont-listen-to-the-copilot is precisely why my workflow is a bit more simple and caveman-ish:
1. start a project with vague README (or take an existing one).
2. create makefile with the "prompt" action that looks something like (I might put it in a script to work around tabs etc):
```
prompt:
for f in `find ./ | grep '*.go *.ts *.files_i_care_about' | grep -v 'files to ignore' | pbcopy`
do
echo "// FILE: $f"
cat $f
done
```3. Run `make prompt` to get a fresh new starting prompt, Go to Gemini (AI Studio) and use the prompt:
```You have the following files. Understand it and we will start building some features.
<Ctrl-v to paste the files copied above> ```
4. It thinks, understands and gives me the "I am ready" line.
5. To build feature X I simply prompt it with:
``` I want to build feature X. Understand it, plan it, and do not regenerate entire files. Just give me unix style diffs. ```
6. Iterate on what i like and dont (including refactors, etc)
7. Copy patches and apply locally
8. Repeat steps 5 - 7.
10. After about 300-400k tokens generated (say over 20-40 features) I snapshot with the prompt:
``` Great now is a great time to checkpoint. Generate a SUMMARY.md on a per folder basis of your understanding of the current state of the project along with a roadmap of next steps. ```
11. I save/update the SUMMARY.md and go to bed. When I come back I repeat from step 2 - and voila the SUMMARY.md generated before are included too.
I have generated about 20M tokens so far at a cost of 0. For me "copy/pasting" diffs is not a big deal. Getting clean code, having a nice custom workflow is more important. I am still ready to relinquish control fully to an agent. I just want a really good code search/auto-complete out of the LLM that adheres to *my* interfaces and constraints.
There are two sets of perverse incentives at play. The main one the author focuses on is that LLM companies are incentivized to produce verbose answers, so that when you task an LLM on extending an already verbose project, the tokens used and therefore cost increases.
The second one is more intra/interpersonal: under pressure to produce, it's very easy to rely on LLMs to get one 80% of the way there and polish the remaining 20%. I'm in a new domain that requires learning a new language. So something I've started doing is asking ChatGPT to come up with exercises / coding etudes / homework for me based on past interactions.
> Its “almost there” quality — the feeling we’re just one prompt away from the perfect solution — is what makes it so addicting. Vibe coding operates on the principle of variable-ratio reinforcement, a powerful form of operant conditioning where rewards come unpredictably. Unlike fixed rewards, this intermittent success pattern (“the code works! it’s brilliant! it just broke! wtf!”), triggers stronger dopamine responses in our brain’s reward pathways, similar to gambling behaviors.
Though I'm not a "vibe coder" myself I very much recognize this as part of the "appeal" of GenAI tools more generally. Trying to get Image Generators to do what I want has a very "gambling-like" quality to it.
I understand your point. The Vibe approach is IMO only effective when you adopt a software engineering mindset. Here's how it works (at least for me with Copilote agent mode):
1. Develop a Minimum Viable Product (MVP) or prototype that functions.
2. Write tests, either before or after the initial development.
3. Implement coding guidelines, style guides, linter etc. Do code reviews.
4. Continuously adjust, add features, refactor, review and expand your test suite. Iterate and let AI run tests and linters on each change
While this process may seem lengthy, it ensures reliability and efficiency. Experienced engineers might find it as quick as working solo, but the structured approach guarantees success. It feels like pairing with a inexperienced developer.
Also, this process may run you into rate limits with Copilot and might not work with your current codebase due to a lack of tests and the absence of applied coding style guides.
Additionally, it takes time. For example, for a simple to mid-level tool/feature in Go, it might take about 1 hour to develop the MVP or prototype, but another 6 to 10 hours to refine it to a quality that you might want to show to other engineers.
Curious what happens if, rather than asking for conciseness, one asks for elegance.
It'd be great to forbid comments eg. using some of the techniques pioneered by Outlines, Instructor et al.
For now I'll do it with some examples in my context priming prompt, like:
Do not emit comments. Instead of this:
# frobnicate a fyzzit
def frobnicate_fyzzit(self):
"""
This function frobnicates a fyzzit.
"""
# Get the fyzzit to frobnicate.
fyzzit = ...
...
Do this: def frobnicate_fyzzit(self):
fyzzit = ...
...
I found useful asking it to write with minimal lines. Let it write a functional code and ask to remove all unnecessary lines, minimalistic, efficient code.
> it might be difficult for AI companies to prioritize code conciseness when their revenue depends on token count.
Would open source, local models keep pressure on AI companies to prioritize the usable code, as code quality and engineering time saved are critical to build vs buy discussions?
I've pretty clearly seen the critical thinking ability of coworkers who depend on AI too much sharply decline over the past year. Instead of taking 30 seconds to break down the problem and work through assumptions, they immediately copy/paste into an LLM and spit back what it tells them.
This has lead to their abilities stalling while their output seemingly goes up. But when you look at the quality of their output, and their ability to get projects over the last 10% or make adjustments to an already completed project without breaking things, it's pretty horrendous.
"Vibe coding as gacha game" is a new wrinkle I didn't expect. It certainly explains why I see people who should know better talking up AI and LLMs like they're the second coming: it's like how stoners talk about weed as a cancer cure.
> There was no standardization of parts in the probe. Two widgets intended to do almost the same job could be subtly different or wildly different. Braces and mountings seemed hand carved. The probe was as much a sculpture as a machine.
> Blaine read that, shook his head, and called Sally. Presently she joined him in his cabin.
> “Yes, I wrote that," she said. "It seems to be true. Every nut and bolt in that probe was designed separately. It's less surprising if you think of the probe as having a religious purpose. But that's not all. You know how redundancy works?"
> “In machines? Two gilkickies to do one job. In case one fails."
> “Well, it seems that the Moties work it both ways."
> “Moties?"
> She shrugged. "We had to call them something. The Mote engineers made two widgets do one job, all right, but the second widget does two other jobs, and some of the supports are also bimetallic thermostats and thermoelectric generators all in one. Rod, I barely understand the words. Modules: human engineers work in modules, don't they?"
> “For a complicated job, of course they do."
> “The Moties don't. It's all one piece, everything working on everything else. Rod, there's a fair chance the Moties are brighter than we are."
- The Mote in God's Eye, Larry Niven and Jerry Pournelle (1974)
[…too bad that today's LLMs are not brighter than we are, at least when it comes to writing correct code…]
Are using the APIs worth the extra cost vs using the web tools? I haven't used any API tools, I am not a programmer, but I have generated many millions of tokens in the web canvas, something that would cost way more than the $20 I spend for them.
Considering most of these LLMs are in some kind of loss leading ramp up phase, wouldn't the incentive lean towards using the least number of tokens?
Werent they recently complaining that people thanking LLMs were costing them too much money?
This article captures a lot of the problem. It’s often frustrating how it tries to work around really simple issues with complex workarounds that don’t work at all. I tell it the secret simple thing it’s missing and it gets it. It always makes me think, god help the vibe coders that can’t read code. I actually feel bad for them.
I've definitely noticed that LLMs want to generate Enterprise-Grade™ code right out of the box. I customize the prompts to tell them that we're under intense pressure to minimize line counts, every line costs $10k, and so to find the simplest solution that will get the job done.
As it happens, I wrote about the need for planning and organizing work (for greenfield or understanding existing projects) only yesterday: https://taoofmac.com/space/blog/2025/05/13/2230
I would seriously consider banning "vibe coding" right now, because:
1. Poor solutions.
2. Solutions not understood by the person who prompted them.
3. Development team being made dumber.
4. Legal and ethical concerns about laundering open source copyrights.
5. I'm suspicious of the name "vibe coding", like someone is intentionally marketing it to people who don't care to be good at their jobs.
6. I only want to hire people who can do holistically better work than current "AI". (Not churn code for a growth startup's Potemkin Village, nor to only nominally satisfy a client's requirements while shipping them piles of counterproductive garbage.)
7. Publicizing that you are a no-AI-slop company might scare away the majority of the bad prospective employees, while disproportionately attracting the especially good ones. (Not that everyone who uses "AI" is bad, but they've put themselves in the bucket with all the people who are bad, and that's a vastly better filter for the art of hiring than whether someone has spent months memorizing LeetCode answers solely for interviews.)
I agree with the thrust of the blog post but honestly most of the increased line count comes from the fact that LLMs have been whipped into leaving moronic comments like
// Create a copy of the state
const stateCopy = copyState(state);
Unlimited Claude Code for $100 on the Max plan is a game changer.
I disagree with the idea that LLM providers are deliberately designing solutions to consume more tokens. We're in the early days of agentic coding, and the landscape is intensely competitive. Providers are focused on building highly capable systems to drive adoption, especially with open-source alternatives just a git clone away.
Yes, Claude Code can be token-heavy, but that's often a trade-off for their current level of capability compared to other options. Additionally, Claude Code has built-in levers for cost (I prefer they continue to focus on advanced capability, let pricing accessibility catch up).
"early days" means:
- Prompt engineering is still very much a required skill for better code and lower pricing
- Same with still needing to be an engineer for the same reasons, and:
- Devs need to actively guide these agents. This includes detailed planning, progress tracking, and careful context management – which, as the author notes, is more involved than many realize. I've personally found success using Gemini to create structured plans for Claude Code to execute, which helps manage its verbosity and focus to "thoughtful" execution (as guided by gemini). I drop entire codebases into Gemini (for free).
> I have probably spent over $1,000 vibe coding various projects into reality
dude, you can use Gemini Pro 2.5 with Cline - it's free and is rated at least as good as Claude Sonnet 3.7 right now.
Dopamine? That sort of thing triggers cortisol for me if anything!
In section 4, the author writes "... cheaper than Claude 3.7 ($0.80 per token vs. $3)".
This is an obvious mistake, the price is per Megatoken, not per token.
Noted — but honestly, that's somewhat expected. Vibe-style coding often lacks structure, patterns, and architectural discipline. That means the developer must do more heavy lifting: decide what they want, and be explicit — whether that’s 'avoid verbosity,' 'use classes,' 'encapsulate logic,' or 'handle errors properly.'
Can we please stop using 'vibe coding' to mean 'ai assisted coding'?? (best breakdown, imo: https://simonwillison.net/2025/Mar/19/vibe-coding/)
Is it really vibe coding if you are building a detailed coding plan, conducting "git-based experimentation with ruthless pruning", and essentially reviewing the code incrementally for correctness and conciseness? Sure, it's a process dependent on AI, but it's very far from nearly "forget[ing] that the code even exists".
That all said, I do think the article captures some of the current cost/quality dilemmas. I wouldn't jump to conclusions that these incentives are actually driving most current training decisions, but it's an interesting area to highlight.
Really makes you wonder where this is all going. What is going to be the thing where we say "Maybe we took this a little too far." I'm sure whatever bloated react apps we see today are nothing in comparison to the monstrosities we have in store for us in the future.
If I do the same with a human developer instead of an AI it’s called ordering not vibe coding.
What’s the difference?
Claude was last week.
The author should try Gemini it’s much better.
This article ignores the enormous demand of AI coding paired with competition between providers. Reducing the price of tokens means that people can afford to generate more tokens. A code provider being cheaper on average to operate than another is a competitive advantage.
I generally agree with the concerns of this article, and wonder about the theory of the LLM having a innate inclination to generate bloated code.
Even in this article though, I feel like there is a lot of anthropomorphization of LLMs.
> LLMs and their limitations when reasoning about abstract logic problems
As I understand them, LLMs don't "reason" about anything. It's purely a statistical sequencing of words (or other tokens) as determined by the training set and the prompt. Please correct me if I'm wrong.
Also, regarding this theory that the models may be biased to produce bloated code: I've reposted this once already, and no one has replied yet, and I still wonder:
----------
To me, this represents one of the most serious issues with LLM tools: the opacity of the model itself. The code (if provided) can be audited for issues, but the model, even if examined, is an opaque statistical amalgamation of everything it was trained on.
There is no way (that I've read of) for identifying biases, or intentional manipulations of the model that would cause the tool to yield certain intended results.
There are examples of DeepState generating results that refuse to acknowledge Tienanmen square, etc. These serve as examples of how the generated output can intentionally be biased, without the ability to readily predict this general class of bias by analyzing the model data.
----------
I'm still looking for confirmation or denial on both of these questions...
gpt4.1 doesn't have the issue of doing more than it should. It is why I prefer it for agentic coding. If anything, it may do a little less than it should, which I can remedy with my handwritten code or with a follow-up prompt. Oh and gpt4.1 has an input context that is over twice as large as claude.
I can feel how the extreme autocomplete of AI is a drug.
Half of my job is fighting the "copy/paste/change one thing" garbage that developers generate. Keeping code DRY. The autocompletes do an amazing job of automating the repeated boilerplate. "Oh you're doing this little snippet for the first and second property? Obviously you want to do that for every property! Let me just expand that out for you!"
And I'm like "oooh, that's nice and convenient".
...
But I also should be looking at that with the stink-eye... part of that code is now duplicated a dozen times. Is there any way to reduce that duplication to the bare minimum? At least so it's only one duplicated declaration or call and all of the rest is per-thingy?
Or any way to directly/automatically wrap the thing without going property-by-property?
Normally I'd be asking myself these questions by the 3rd line. But this just made a dozen of those in an instant. And it's so tempting and addictive to just say "this is fine" and move on.
That kind of code is not fine.
[flagged]
https://archive.ph/EzbNK