I'm excited to see what cuTile-rs unlocks. Like the direction of HuggingFace's grout https://github.com/huggingface/grout project for local LLM inference:
- state of the art performance
- codebase that fits in a context window (including kernel definitions!)
- single binary deployment
Similar to antirez's ds4.c, but in Rust and with cuTile making kernels both easier to author and higher performance.
[dead]