noob question: why would increased demand result in decreased intelligence?
It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.
I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load
from what I understand this can come from the batching of requests.
An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.