>Like anything, you have to decide between polish vs switch to any other task in the queue
Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
>Also, Codex and Claude Code aren't as bad as people say. I think most of the noise is embellished by the "hah see? AI sucks" angle.
Why shouldn't it? They're not the ones making the extraordinary claims.
> Why do you "have to decide"? Let some agents go at both of those, isn't that what they claim people can just do?
Because your code is still marching somewhere in tokens per second. You have to decide where they are allocated: polish or the next thing. Humans still are the ones prompting LLMs and deciding what is done.
> isn't that what they claim? Why shouldn't it? They're not the ones making the extraordinary claims.
Even if I grant that someone else makes excessive claims, why would that let you off the hook to stay grounded?
Though I don't grant it. Maybe if Anthropic claimed that Opus makes all decisions at the company and builds all software without humans doing all the prompting, the critics would make more sense.
Until then, it looks more like a double standard: if software built with AI has any issues, then see, AI is shit and the humans who invoked it had no role in it. e.g. it could be the case that Anthropic's Claude Code engineers just aren't doing as much polish as they should.
Better answer: Someone asked why might it be the case that AI-written software has issues, and it has a real answer. Marketing claims are a different conversation.