logoalt Hacker News

Broken Proofs and Broken Provers

54 pointsby RebelPotatotoday at 9:00 AM10 commentsview on HN

Comments

zero_ktoday at 4:49 PM

I still remember the time when a gcc bug caused MiniSat to output UNSAT for a satisfiable problem [1]. I was the author of a SAT solver, and I was chasing my tail trying to figure out why I was getting UNSAT for a satsifiable problem. I have to admit I didn't expect it to be a gcc bug... (note: bug was found by Vegard Nossum on the CryptoMiniSat mailing list)

[1] https://www.msoos.org/2013/04/gcc-4-5-2-at-sat-competition-2...

N_Lenstoday at 1:45 PM

The multithreading bug in Isabelle is fascinating from a systems design perspective: threads racing ahead assuming a stuck proof will succeed, creating false positives. That's exactly the kind of subtle concurrency issue that makes verification tools need... verification tools.

show 2 replies
LegionMammal978today at 3:12 PM

I'm a bit skeptical about taking the rates of reported soundness bugs between different systems and drawing conclusions about the underlying approaches. There's typically a class of bugs that users can stumble into by doing something unusual, and then another class of bugs that can only really be found by exploiting holes in the implementation. The first class depends on which features get exercised the most, the second class depends on how many eyes are on the source (and how accessible it is), and both classes heavily depend on the overall size of the userbase.

E.g., Metamath is designed to be as theoretically simple as possible, to the point that it's widely considered a toy in comparison to 'serious' proof systems: a verifier is mainly just responsible for pushing around symbols and strings. In spite of this simplicity, I was able to find soundness bugs in a couple major verifiers, simply because few people use the project to begin with, and even fewer take the time to pore over the implementations.

So I'd be hesitant to start saying that one approach is inherently more or less bug-prone than another, except to be slightly warier in general of larger or less accessible kernels.

show 1 reply