logoalt Hacker News

Three types of LLM workloads and how to serve them

69 pointsby charles_irlyesterday at 4:15 PM4 commentsview on HN

Comments

rippeltippelyesterday at 10:26 PM

> Gallia est omnis divisor in partes tres.

OCD-driven fix: The correct Latin quote is "Gallia est omnis divisa in partes tres".

show 1 reply
ZsoltTtoday at 2:30 AM

> we recommend using SGLang with excess tensor parallelism and EAGLE-3 speculative decoding on live edge Hopper/Blackwell GPUs accessed via low-overhead, prefix-aware HTTP proxies

lord

show 1 reply