>I've certainly felt the "mimics good code" thing in the past. Yup, that's what makes reading LLM code far more intense for me in a bad way.
With a human, I'm reading at a higher level than line by line: I can think "hey this person is a senior dev new to the company, so I can assume some basics, let's focus on business assumptions he might not know", or "this is a junior writing async code, danger, better check for race conditions". With LLMs there's no assumption, you can get a genius application of a design pattern tested by a silly assert.Equal(true, true).
>I've started forcing Claude Code into a red/green TDD cycle for almost everything which makes it much less likely to write code that it hasn't at least executed via the tests.
Funnily, that was my train of thought to keep it tamed as well, but I had very mixed results. I've used cursor more than claude, but with both I had trouble to get it to follow TDD patterns: It would frequently create a red-phase test, then realise it doesn't pass (as expected), think that was an error on its part, and so it would change the test to pass when the bug is reproduced, giving green for the wrong behavior. This pattern reemerged constantly even if corrected.