They say they are using https://github.com/tile-ai/TileRT
- persistent CUDA kernel
- tiled processing with overlapping read/writes
- model designed with specific constraints in mind
Excuse me, do aliens live among us? 17 commits, 99% Python and multiplying the speed of GLM, Deepseek V4, MiMO 2.5?
Excuse me, do aliens live among us? 17 commits, 99% Python and multiplying the speed of GLM, Deepseek V4, MiMO 2.5?