logoalt Hacker News

greenavocadolast Sunday at 2:41 AM2 repliesview on HN

If left to its own devices, Claude will resort to writing passable looking BS as tests pretty quickly when the going gets tough e.g. if it has to interface with stateful real world systems and its tests struggle to pass


Replies

botanical76last Sunday at 1:59 PM

This is my experience as well. If you want it to write good tests, you have to take a much more involved approach of first making it establish what needs testing in each module, writing each test one at a time, and making it prove that it can break the test by modifying the source code to introduce a bug, modify the test to be appropriate, rinse and repeat. I haven't done this much because it's very expensive in terms of time and premium tokens...right now, I just write most tests myself so at least I have faith in the verification suite.

toughlast Sunday at 3:17 AM

claude is like an intern, someone has to code review and approve before final delivery imho

show 1 reply