logoalt Hacker News

kgeistyesterday at 9:46 PM0 repliesview on HN

What about constrained decoding (with JSON schemas)? I noticed my vLLM instance is using 1 CPU 100%.