logoalt Hacker News

jiggawatts10/11/20242 repliesview on HN

I thought all current implementations accumulate into a fp32 instead of accumulating in fp16.


Replies

janwas10/12/2024

We (gemma.cpp) recently started accumulating softmax terms into f64. There is at least one known case of this causing differing output, but after 200 tokens, hence unlikely to be detected in many benchmarks.

Does anyone have experience with higher-precision matmul and whether it is worthwhile?

show 1 reply
KeplerBoy10/11/2024

I haven't looked at all implementations, but the hardware (tensor cores as well as cuda cores) allows you to accumulate at fp16 precision.