logoalt Hacker News

storystarlingyesterday at 10:26 PM0 repliesview on HN

I ended up overengineering a LangGraph workflow to handle this. It forces the LLM to generate and pass its own tests in a sandbox before I even see the PR. The API costs are significantly higher because of the retry loops, but it filters out the low effort attempts.