(I'm not an expert. I'd love to be corrected by someone who actually knows.) Floating-po...

jkaptur • yesterday at 9:23 PM • 1 reply • view on HN

(I'm not an expert. I'd love to be corrected by someone who actually knows.)

Floating-point arithmetic is not associative. (A+B)+C does not necessarily equal A+(B+C), but you can get a performance improvement by calculating A, B, and C in parallel, then adding together whichever two finish first. So, in theory, transformers can be deterministic, but in a real system they almost always aren't.

Replies

10000truths • yesterday at 10:22 PM

Not an expert either, but my understanding is that large models use quantized weights and tensor inputs for inference. Multiplication and addition of fixed-point values is associative, so unless there's an intermediate "convert to/from IEEE float" step (activation functions, maybe?), you can still build determinism into a performant model.

➕ show 1 reply

alt Hacker News

Replies