logoalt Hacker News

jasonjmcgheeyesterday at 3:57 PM4 repliesview on HN

> AlphaEvolve achieved up to a 32.5% speedup for the FlashAttention kernel implementation in Transformer-based AI models

> In roughly 75% of cases, it rediscovered state-of-the-art solutions, to the best of our knowledge.

> And in 20% of cases, AlphaEvolve improved the previously best known solutions

These sound like incredible results. I'd be curious what kind of improvements were made / what the improvements were.

Like, was that "up to a 32.5% speedup" on some weird edge case and it was negligible speed up otherwise? Would love to see the benchmarks.


Replies

schmidtleonardyesterday at 4:10 PM

Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.

It's a very impressive result, but not magic, but also not cheating!

show 3 replies
cavisnetoday at 2:42 AM

From the paper it was a speedup on the XLA GPU kernel they wrote using Jax, which is probably not SOTA. I don't think Jax even has a official flash attention implementation.

show 1 reply
cubefoxyesterday at 4:35 PM

> AlphaEvolve is accelerating AI performance and research velocity. By finding smarter ways to divide a large matrix multiplication operation into more manageable subproblems, it sped up this vital kernel in Gemini’s architecture by 23%, leading to a 1% reduction in Gemini's training time.

Amekedlyesterday at 4:53 PM

I'm thinking reading numbers like this is really just slop lately.

FA achieving a 32.5% speed up? Cool.

Why not submit it as a PR to the Flash Attention repo then? Can I read about it more in detail?

show 3 replies