Given the $10k price tag for tokens and high rate of bugs (several per minute) they mention, it'd be very interesting to see this experiment run with cheaper models too.
I wonder if we get to a world where a full repo sweep like this is a default Github action after commit.
Most C/C++ projects I know don't even run tests with ASan/TSan/UBSan before each commit/merge.
and in the meantime, just a sweep of the committed code (or the to-be-committed code for lots of us) and the code it interacts with, is increasingly catching lots of problems.