I thought all current implementations accumulate into a fp32 instead of accumulating in fp16.

jiggawatts • 10/11/2024 • 2 replies • view on HN

Replies

janwas • 10/12/2024

We (gemma.cpp) recently started accumulating softmax terms into f64. There is at least one known case of this causing differing output, but after 200 tokens, hence unlikely to be detected in many benchmarks.

Does anyone have experience with higher-precision matmul and whether it is worthwhile?

➕ show 1 reply

KeplerBoy • 10/11/2024

I haven't looked at all implementations, but the hardware (tensor cores as well as cuda cores) allows you to accumulate at fp16 precision.

alt Hacker News

Replies