logoalt Hacker News

D-Machinetoday at 5:28 AM2 repliesview on HN

> You have the source code though. That is the "reproducibility" bit you need.

I am talking about reproducing the (perhaps erroneous) logic or thinking or motivations in cases of bugs, not reproducing outputs perfectly. As you said, current LLM models are non-deterministic, so we can't have perfect reproducibility based on the prompts, but, when trying to fix a bug, having the basic prompts we can see if we run into similar issues given a bad prompt. This gives us information about whether the bad / bugged code was just a random spasm, or something reflecting bad / missing logic in the prompt.

> Is it important to know that when the foo file was refactored the developer chose to do it by hand vs letting the IDE do it with an auto-refactor command vs just doing a simple find and replace? Maybe it is for code review purposes, but for "reproducibility" I don't think it is.

I am really using "reproducibility" more abstractly here, and don't mean perfect reproducibility of the same code. I.e. consider this situation: "A developer said AI wrote this code according to these specs and prompt, which, according to all reviewers, shouldn't produce the errors and bad code we are seeing. Let's see if we can indeed reproduce similar code given their specs and prompt". The less evidence we have of the specifics of a session, the less reproducible their generated code is, in this sense.


Replies

xmcqdpt2today at 12:31 PM

It's not reproducible though.

Even with the exact same prompt and model, you can get dramatically different results especially after a few iterations of the agent loop. Generally you can't even rely on those though: most tools don't let you pick the model snapshot and don't let you change the system prompt. You would have to make sure you have the exact same user config too. Once the model runs code, you aren't going to get the same outputs in most cases (there will be date times, logging timestamps, different host names and user names etc.)

I generally avoid even reading the LLM's own text (and I wish it produced less of it really) because it will often explain away bugs convincingly and I don't want my review to be biased. (This isn't LLM specific though -- humans also do this and I try to review code without talking to the author whenever possible.)

newswasboringtoday at 12:36 PM

You are talking about documenting the intent of a piece of software if I understand correctly. But isn't that what READMEs and comments are for?