Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is ...

schmidtleonard • 05/14/2025 • 3 replies • view on HN

Remember that GPUs have cache hierarchies and matching block sizes to optimally hit those caches is a big win that you often don't get by default, just because the number of important kernels times important GPUs times effort to properly tune one is greater than what people are willing to do for others for free in open source. Not to mention kernel fusion and API boundaries that socially force suboptimal choices for the sake of clarity and simplicity.

It's a very impressive result, but not magic, but also not cheating!

Replies

hiddencost • 05/14/2025

100%. LLMs are extremely useful for doing obvious but repetitive optimizations that a human might miss.

➕ show 2 replies

jasonjmcghee • 05/14/2025

Absolutely - not arguing that the results are unreasonable to the point of illegitimacy - just curious to see when they perform as well as reported and how well the presented solutions generalize to different test cases - or if it's routing to different solutions based on certain criteria etc.

othorns • 05/15/2025

Hey, do you have any suggestions for resources to learn more about this kind of custom optimisation? Sounds interesting, but not sure where to start?

➕ show 1 reply

alt Hacker News

Replies