Red/green is especially good with claude because even now with opus 4.6, claude can throw out a...

vessenes • today at 9:47 AM • 2 replies • view on HN

Red/green is especially good with claude because even now with opus 4.6, claude can throw out a little comment like “//Implementation on hold until X/Y/Z: return { true }” and proceed to completely skip implementation based on the inline skip comment for a longgg time. It used to do this aggressively even in the tests, but by and large red/green prompting helps immensely - it tells the agent “think of failing tests as SUCCESS right now” - then you’ll get lots of them.

I’ve always been partial to integration tests too. Hand coding made integration tests feel bad; you’re almost doubling the code output in some cases - especially if you end up needing to mock a bunch of servers. Nowadays that’s cheap, which is super helpful.

Replies

jghn • today at 1:42 PM

Granted it doesn't always pay attention to Claude.md but one thing I've done is in my block of rules it must always follow is to never leave something unimplemented w/ placeholders unless explicitly told to do so. It's made this mostly go away for me.

sd9 • today at 10:18 AM

Yeah, I've always _preferred_ integration tests, but the cost of building them was so great. Now the cost is effectively eliminated, and if you make a change that genuinely does affect an integration test (changing the text on a button, for example) it's easy to smart-find-and-replace and fix them up. So I'm using them a lot more.

The only problem is... they still take much longer to _run_ than unit tests, and they do tend to be more flaky (although Claude is helpful in fixing flaky tests too). I'm grateful for the extra safety, but it makes deployments that much slower. I've not really found a solution to that part beyond parallelising.

alt Hacker News

Replies