> we recommend using SGLang with excess tensor parallelism and EAGLE-3 speculative decoding on li...

ZsoltT • today at 2:30 AM • 1 reply • view on HN

> we recommend using SGLang with excess tensor parallelism and EAGLE-3 speculative decoding on live edge Hopper/Blackwell GPUs accessed via low-overhead, prefix-aware HTTP proxies

lord

Replies

charles_irl • today at 3:46 AM

Sorry to lead with a bunch of jargon! Wanted to make it obvious that we'd give concrete recommendations instead of palaver.

The technical terms there are later explained and diagrammed, and the recommendations derived from something close to first principles (e.g. roofline analysis).

alt Hacker News

Replies