> Token-limit exceeded -> empty output. Just a guess, though. That'd be really non-obvi...

embedding-shape • today at 10:07 AM • 2 replies • view on HN

> Token-limit exceeded -> empty output. Just a guess, though.

That'd be really non-obvious behavior, I'm not aware of any inference engine that works like that by default, usually you'd get everything up until the limit, otherwise that kind of breaks the whole expectation about setting a token-limit in the first place...

Replies

GrinningFool • today at 12:00 PM

I just fixed this bug in a summarizer. Reasoning tokens were consuming the budget I gave it (1k), so there was only a blank response. (Qwen3.5-35B-A3B)

➕ show 1 reply

qayxc • today at 10:13 AM

This doesn't necessarily relate to the inference itself. No models are exposed to input directly when using web-based APIs, there's pre-processing layers involved that do undocumented stuff in opaque ways.

alt Hacker News

Replies