logoalt Hacker News

yu3zhou4yesterday at 5:35 AM0 repliesview on HN

I’m recreating a tiny version of vLLM in C++ and CUDA from scratch (high throughput LLM inference server)

https://github.com/jmaczan/tiny-vllm