logoalt Hacker News

qeternityyesterday at 10:58 PM1 replyview on HN

Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.


Replies

Scene_Cast2today at 12:28 AM

Yep. MoE, FlashAttention, or sparse retrieval architectures for example.