The implementation absolutely can influence the outputs. If you have a sloppy implementations whic...

KeplerBoy • 10/11/2024 • 3 replies • view on HN

The implementation absolutely can influence the outputs.

If you have a sloppy implementations which somehow accumulates a lot of error in it's floating point math, you will get worse results.

It's rarely talked about, but it's a real thing. Floating point addition and multiplication is non-associative and the order of operations affects the correctness and performance. Developers might (unknowningly) trade performance for correctness. And it matters a lot more in the low precision modes we operate today. Just try different methods of summing a vector containing 9,999 fp16 ones in fp16. Hint: it will never be 9,999.0 and you won't get close to the best approximation if you do it in a naive loop.

Replies

jiggawatts • 10/11/2024

I thought all current implementations accumulate into a fp32 instead of accumulating in fp16.

➕ show 2 replies

sroussey • 10/11/2024

How well does bf16 work in comparison?

➕ show 1 reply

littlestymaar • 10/11/2024

TIL, thanks.

alt Hacker News

Replies