Yes, absolutely in deep learning. Custom fused CUDA kernels everywhere.
Yep. MoE, FlashAttention, or sparse retrieval architectures for example.
Yep. MoE, FlashAttention, or sparse retrieval architectures for example.