logoalt Hacker News

AdieuToLogicyesterday at 8:01 AM2 repliesview on HN

>> Compilers are deterministic, making their generated assembly code verifiable

> People keep saying this like it is an absolute fact, whereas in reality it is a scale.

My statement is of course a generalization due to its terseness and focuses on the expectation of repeatable results given constant input, excluding pathological definitions of nondeterminism such as compiler-defined macro values or implementation defects. Modern compilers are complex systems and not really my point.

> This leads to the point: in general do we care about this non-determinism?

> Most of the time, no we don't.

Not generally the type of nondeterminism I described, no. Nor the nondeterministic value of the `__DATE__` macro referenced in the StackOverflow link you provided.

> Once you accept that the next stage is accepting that most of the time the non-deterministic output of an LLM is good enough!

This is where the wheels fall off.

First, "most of the time" only makes sense when there is another disjoint group of "other times." Second, the preferred group defined is "non-deterministic [sic] output of an LLM is good enough", which means the "other times" are when LLM use is not good enough. Third, and finally, when use of an approach (or a tool) is unpredictable (again, excluding pathological cases) given the same input, it requires an open set of tests to verify correctness over time.

That last point may not be obvious, so I will extrapolate as to why it holds.

Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt. This implies prompt evolution will also be required at a frequency almost certainly different than unpredictable document generation intrinsic to LLMs. This in turn implies test expectations and/or definitions having to evolve over time with nothing changing other than undetectable model evolution. Which means any testing which exists at one point in time cannot be relied upon to provide the same verifications at a later point in time. Thus the requirement of an open set of tests to verify correctness over time.

Finally, to answer your question of:

  how do I verify it is good enough
You can't, because what you describe is a multi-story brick house built on a sand dune.

Replies

nlyesterday at 11:32 AM

> Assuming the LLM in use has, or is reasonably expected to have, model evolution, documents generated by same will diverge unpredictably given a constant prompt.

So what?

You tell it once. It writes code.

You test that code, not the prompt.