We (gemma.cpp) recently started accumulating softmax terms into f64. There is at least one known cas...

janwas • 10/12/2024 • 1 reply • view on HN

We (gemma.cpp) recently started accumulating softmax terms into f64. There is at least one known case of this causing differing output, but after 200 tokens, hence unlikely to be detected in many benchmarks.

Does anyone have experience with higher-precision matmul and whether it is worthwhile?

Replies

ComputerGuru • 10/12/2024

Isn’t 200 tokens basically nothing? Did you mean to say 2000?

➕ show 1 reply

alt Hacker News

Replies