> My issues all stem from that it works, but does the wrong thing
It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests.
I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.
This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.