The biggest problem I've seen with CI isn't the failing part, it's what teams do when...

mihir_kanzariya • today at 3:15 PM • 3 replies • view on HN

The biggest problem I've seen with CI isn't the failing part, it's what teams do when it fails. The "just rerun it" culture kills the whole point.

We had a codebase where about 15% of CI runs were flaky. Instead of fixing the root causes (mostly race conditions in tests and one service that would intermittently timeout), the team just added auto-retry. Three attempts before it actually reported failure. So now a genuinely broken build takes 3x longer to tell you it's broken, and the flaky stuff just gets swept under the rug.

The article's right that failure is the point, but only if someone actually investigates the failure instead of clicking retry.

Replies

jbstack • today at 4:06 PM

I don't understand this at all. Why not just skip CI altogether if you're not interested in the results?

flowerbreeze • today at 3:29 PM

The "just retry" approach is truly bothersome. I think it is at least partly an organizational issue, because it happens far more often when QA is a separate team.

ant6n • today at 3:52 PM

If a build fails 10% of the time, it actually takes 100x longer before to fail for the 10%x10%x10% case.

alt Hacker News

Replies