Does this tool assume it takes the same amount of time to test two commits once as it does to test one commit twice? Maybe true for interpreted languages, but if you're waiting 15 minutes to compile LLVM you're probably going to want to run your 1 second flaky test more than once. Probably pretty trivial to fix this though?
Great idea anyway!