I really want automated QA to work better! It's a great thing to work on.
Some feedback:
- I definitely don't want three long new messages on every PR. Max 1, ideally none? Codex does a great job just using emoji.
- The replay is cool. I don't make a website, so maybe I'm not the target market, but I'd like QA for our backend.
- Honestly, I'd rather just run a massive QA run every day, and then have any failures bisected, rather than per-PR.
- I am worried that there's not a lot of value beyond the intelligence of the foundation models here.
Thanks for the feedback! - Agreed that the form factor can be condensed with a link to detailed information - With the codebase understanding, backend is where we are looking to expand and provide value - The intelligence of the models does lay out the foundation but combining the strength of these models unlocks a system of specialized agents that each reason about the codebase differently to catch the unknown unknowns
Agree on your last point and it's going to be a very bitter lesson. In any case, you probably wanna shift alot of the code verification as left as possible so doing review at PR time isnt the right strat imo. And claude/codex are well positioned to do the local review.