From the paper it was a speedup on the XLA GPU kernel they wrote using Jax, which is probably not SO...

cavisne • 05/15/2025 • 1 reply • view on HN

From the paper it was a speedup on the XLA GPU kernel they wrote using Jax, which is probably not SOTA. I don't think Jax even has a official flash attention implementation.

Replies

yarri • 05/15/2025

Not sure what “official” means but would direct you to the GCP MaxText [0] framework which is not what this GDM paper is referring to but rather this repo contains various attention implementations in MaxText/layers/attentions.py

[0] https://github.com/AI-Hypercomputer/maxtext

alt Hacker News

Replies