alt
Hacker News
kgeist
•
yesterday at 9:46 PM
•
0 replies
•
view on HN
What about constrained decoding (with JSON schemas)? I noticed my vLLM instance is using 1 CPU 100%.