What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?
For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.
All of the prompts I used are in the article. The two most relevant to testing were:
And later: Good coding models don't need much of a push to get heavily into automated testing.I used Codex for a few reasons:
1. Claude was down on Sunday when I kicked off tbis project
2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment
3. I wanted to see how well the new GPT-5.2 could handle a long running project