This is a nice idea, and one I also advocate for, however it's important to keep in mind that t...

tristor • 12/09/2024 • 5 replies • view on HN

This is a nice idea, and one I also advocate for, however it's important to keep in mind that the idea of reproducibility relies on determinism. So much of what goes into a build pipeline is inherently nondeterministic, because we're making decisions at compile time which can differ from compilation run to compilation run, setting aside flags. In fact, that's the point of an optimizing compiler, as many reproducible build projects have discovered, turning on optimizations pretty much guarantees no reproducibility.

Replies

actionfromafar • 12/09/2024

As long as the compiler is not optimizing by "let's give this 3 seconds of solving time, then continue if no better solution is found", then optimizing is not inherently nondeterministic.

CorrectHorseBat • 12/09/2024

Counterpoint: Archlinux is 89% reproducible with optimizations enabled. The only thing I see which is difficult to make reproducible is optimizations with a timeout.

➕ show 1 reply

pjc50 • 12/10/2024

This is defeatist: compilers do not usually use the system RNG to make decisions, so what's happening is entirely accidental introduction of difference which propagates.

There is "intentional input" (contents of the source files), and "accidental input" (source file full paths, timestamps, layout of memory given to you by the OS, and so on). A reproducible build system should give the same output for the same "intentional input".

(the only place where you do see RNG driven optimization is things like FPGA routing, which is a mess of closed toolchains anyway. It has no place in regular software compilers.)

screcth • 12/09/2024

Why does an optimizing compiler introduce nondeterminism?

In my mind an optimizing compiler is a pure function that takes source code and produces an object file.

➕ show 1 reply

Groxx • 12/09/2024

"Reproducible" isn't necessary for "not modified from what everyone else gets", and that still makes some attacks FAR harder (and easier to identify, as you know what the "normal" one is). And a published Merkle tree just makes it easier to verify "none of this has changed", as opposed to SHAs on a website that could change any time.

➕ show 1 reply

alt Hacker News

Replies