Don't inference servers like vllm or sglang just translate these things to openai-compat API sh...

kleton • today at 1:25 PM • 1 reply • view on HN

Don't inference servers like vllm or sglang just translate these things to openai-compat API shapes?

Replies

They do, but that's kind of the article's point - someone still has to write and maintain the per-model chat template and tool call parsing inside vllm/sglang. Every time a new model ships with a slightly different format, the inference server needs an update. The M×N problem doesn't disappear, it just gets pushed one layer down.

alt Hacker News

Replies