noob question: why would increased demand result in decreased intelligence?

megabless123 • yesterday at 3:54 PM • 4 replies • view on HN

Replies

An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.

➕ show 2 replies

vidarh • yesterday at 4:01 PM

It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.

➕ show 3 replies

awestroke • yesterday at 4:06 PM

I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load

Wheaties466 • yesterday at 4:03 PM

from what I understand this can come from the batching of requests.

➕ show 1 reply

alt Hacker News

Replies