> Or just don't use AI to write code.
Anecdata, but I'm still finding CC to be absolutely outstanding at writing code.
It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting, basically no "specs" - just giving it coherent sane direction: like to make sure it tests things in several different ways, for several different cases, including performance, comparing directly to similar implementations (and constantly triple-checking that it actually did what you asked after it said "done").
For $200/mo, I can still run 2-3 clients almost 24/7 pumping out features. I rarely clear my session. I haven't noticed quality declines.
Though, I will say, one random day - I'm not sure if it was dumb luck - or if I was in a test group, CC was literally doing 10x the amount of work / speed that it typically does. I guess strange things are bound to happen if you use it enough?
Related anecdata: IME, there has been a MASSIVE decline in the quality of claude.ai (the chatbot interface). It is so different recently. It feels like a wanna-be crapier version of ChatGPT, instead of what it used to be, which was something that tried to be factual and useful rather than conversational and addictive and sycophantic.
How can a person reconcile this comment with the one at the root of this thread? One person says Claude struggles to even meet the strict requirements of a spec sheet, another says Claude is doing a great job and doesn’t even need specific specs?
I have my own anecdata but my comment is more about the dissonance here.
> basically no "specs" - just giving it coherent sane direction
This is one variable I almost always see in this discussion: the more strict the rules that you give the LLM, the more likely it is to deeply disappoint you
The earlier in the process you use it (ie: scaffolding) the more mileage you will get out of it
It's about accepting fallability and working with it, rather than trying to polish it away with care
But, how do you know the code is good?
If you do spot checks, that is woefully inadequate. I have lost count of the number of times when, poring over code a SOTA LLM has produced, I notice a lot of subtle but major issues (and many glaring ones as well), issues a cursory look is unlikely to pick up on. And if you are spending more time going over the code, how is that a massive speed improvement like you make it seem?
And, what do you even mean by 10x the amount of work? I keep saying anybody that starts to spout these sort of anecdotes absolutely does NOT understand real world production level serious software engineering.
Is the model doing 10x the amount of simplification, refactoring, and code pruning an effective senior level software engineer and architect would do? Is it doing 10x the detailed and agonizing architectural (re)work that a strong developer with honed architectural instincts would do?
And if you tell me it's all about accepting the LLM being in the driver's seat and embracing vibe coding, it absolutely does NOT work for anything exceeding a moderate level of complexity. I used to try that several times. Up to now no model is able to write a simple markdown viewer with certain specific features I have wanted for a long time. I really doubt the stories people tell about creating whole compilers with vide coding.
If all you see is and appreciate that it is pumping out 10x features, 10x more code, you are missing the whole point. In my experience you are actually producing a ton of sh*t, sorry.
months you say? how incredible. it beggars belief in fact
Not sure about ChatGPT, but Claude was (is still?) an absolute ripper at cracking some software if one has even a little bit of experience/low level knowledge. At least, that's what my friend told me... I would personally never ever violate any software ToA.
My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.
A small app, or a task that touches one clear smaller subsection of a larger codebase, or a refactor that applies the same pattern independently to many different spots in a large codebase - the coding agents do extremely well, better than the median engineer I think.
Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.
As soon as the codebase is large and there are gotchas, edge cases where one area of the code affects the other, or old requirements - things get treacherous. It will forget something was implemented somewhere else and write a duplicate version, it will hallucinate what the API shapes are, it will assume how a data field is used downstream based on its name and write something incorrect.
IMO you can still work around this and move net-faster, especially with good test coverage, but you certainly have to pay attention. Larger codebases also work better when you started them with CC from the beginning, because it's older code is more likely to actually work how it exepects/hallucinates.