"I used AI to write a GPU-only MoE forward and backward pass to supplement the manual implementation in PyTorch that only supported a few specific GPUs" -> https://github.com/lostmsu/grouped_mm_bf16 100% vibe coded.