> The point is exactly that, that ai feels like reviewing other people’s code, only worse because bad ai written code mimics good code in a way that bad human code doesn’t, and because you don’t get the human factor of mentoring someone when you see they lack a skill.
Yeah, that's a good way to put it.
I've certainly felt the "mimics good code" thing in the past. It's been less of a problem for me recently, maybe because I've started forcing Claude Code into a red/green TDD cycle for almost everything which makes it much less likely to write code that it hasn't at least executed via the tests.
The mentoring thing is really interesting - it's clearly the biggest difference between working with a coding agent and coaching a human collaborator.
I've managed to get a weird simulacrum of that by telling the coding agents to take notes as they work - I even tried "add to a til.md document of things you learned" on a recent project - and then condensing those lessons into an AGENTS.md later on.
>I've certainly felt the "mimics good code" thing in the past. Yup, that's what makes reading LLM code far more intense for me in a bad way.
With a human, I'm reading at a higher level than line by line: I can think "hey this person is a senior dev new to the company, so I can assume some basics, let's focus on business assumptions he might not know", or "this is a junior writing async code, danger, better check for race conditions". With LLMs there's no assumption, you can get a genius application of a design pattern tested by a silly assert.Equal(true, true).
>I've started forcing Claude Code into a red/green TDD cycle for almost everything which makes it much less likely to write code that it hasn't at least executed via the tests.
Funnily, that was my train of thought to keep it tamed as well, but I had very mixed results. I've used cursor more than claude, but with both I had trouble to get it to follow TDD patterns: It would frequently create a red-phase test, then realise it doesn't pass (as expected), think that was an error on its part, and so it would change the test to pass when the bug is reproduced, giving green for the wrong behavior. This pattern reemerged constantly even if corrected.