Aren’t transformers intrinsically deterministic? I thought the randomness was intentional to make ch...

an0malous • yesterday at 8:16 PM • 6 replies • view on HN

Aren’t transformers intrinsically deterministic? I thought the randomness was intentional to make chatbots seem more natural, and OpenAI used to have a seed parameter you could set for deterministic output. I don’t know why that feature isn’t more popular, for the reasons this article outlines

Replies

jkaptur • yesterday at 9:23 PM

(I'm not an expert. I'd love to be corrected by someone who actually knows.)

Floating-point arithmetic is not associative. (A+B)+C does not necessarily equal A+(B+C), but you can get a performance improvement by calculating A, B, and C in parallel, then adding together whichever two finish first. So, in theory, transformers can be deterministic, but in a real system they almost always aren't.

➕ show 1 reply

janalsncm • yesterday at 9:30 PM

Transformers are just a special kind of binary which are run by inference code. Where the rubber meets the road is whether the inference setup is deterministic. There’s some literature on this: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

I don’t think the issue is determinism per se but chaotic predictions that are difficult to rely on.

➕ show 1 reply

solsane • yesterday at 9:51 PM

Well, you could say that about computers in general. I'm assuming you're referring to temperature (or something similar) which can be set to always pick the most probable token. Floats aside, this should be deterministic. But practically I don't think that changes much since adjusting the input slightly can lead to very different output. Also back in the day the temperature helped it avoid cyclic loops

➕ show 1 reply

esafak • yesterday at 10:05 PM

The models generate a token distribution. Which one to pick is a choice. One can sample from the distribution, hence the randomness.

bpodgursky • yesterday at 9:32 PM

Strict deterministic output for a given prompt prevents the use of RAG, which increasingly limits the relative utility of a LLM within an organization.

ares623 • yesterday at 8:40 PM

alt Hacker News

Replies