This is a central problem that weve already seen proliferate wildly in Scientific research , and currently if the same is allowed to be embedded in foundational code. The future outlook would be grim.
Replication crisis[1].
Given initial conditions and even accounting for 'noise' would a LLm arrive at the same output.It should , for the same reason math problems require one to show their working. Scientific papers require the methods and pseudocode while also requireing limitations to be stated.
Without similar guardrails , maintainance and extension of future code becomes a choose your own adventure.Where you have to guess at the intent and conditions of the LLM used.
[1] https://www.ipr.northwestern.edu/news/2024/an-existential-cr...
Even if you pin the seed and spin up your own local LLM, changes to continuous batching at the vLLM level or just a different CUDA driver version will completely break your bitwise float convergence. Reproducibility in ML generation is a total myth, in prod we only work with the final output anyway
You can leave commit messages or comments without spamming your history with every "now I'm inspecting this file..." or "oops, that actually works differently than I expected" transcript.
In fact, I'd wager that all that excess noise would make it harder to discern meaningful things in the future than simply distilling the meaningful parts of the session into comments and commit messages.
> for the same reason math problems require one to show their working.
We don't put our transitional proofs in papers, only the final best one we have. So that analogy doesn't work.
For every proof in a paper there is probably 100 non-working / ugly sketches or just snippets of proofs that exist somewhere in a notebook or erased on a blackboard.
but we've been doing the same without llm. what're the new pieces which llm would bring in?
Agentic engineering is fundamentally different, not just because of the inherent unpredictability of LLMs, but also because there's a wildly good chance that two years from now Opus 4.6 will no longer even be a model anyone can use to write code with.