logoalt Hacker News

janwas10/12/20241 replyview on HN

We (gemma.cpp) recently started accumulating softmax terms into f64. There is at least one known case of this causing differing output, but after 200 tokens, hence unlikely to be detected in many benchmarks.

Does anyone have experience with higher-precision matmul and whether it is worthwhile?


Replies

ComputerGuru10/12/2024

Isn’t 200 tokens basically nothing? Did you mean to say 2000?

show 1 reply