Three types of LLM workloads and how to serve them

69 points • by charles_irl • yesterday at 4:15 PM • 4 comments • view on HN

Comments

rippeltippel • yesterday at 10:26 PM

> Gallia est omnis divisor in partes tres.

OCD-driven fix: The correct Latin quote is "Gallia est omnis divisa in partes tres".

➕ show 1 reply

> we recommend using SGLang with excess tensor parallelism and EAGLE-3 speculative decoding on live edge Hopper/Blackwell GPUs accessed via low-overhead, prefix-aware HTTP proxies

lord

➕ show 1 reply

alt Hacker News

Three types of LLM workloads and how to serve them

Comments