> If one is quantising and another isn't there's a big difference in quality. Sure. B...

magicalhippo • today at 8:38 AM • 0 replies • view on HN

> If one is quantising and another isn't there's a big difference in quality.

Sure. But the problem is you have to do this continuously to have any measure of confidence, which is expensive. For example, a provider could at any point randomly start serving some fraction of the requests to a quantized model. Either due to "routing error", as Anthropic called one of their model degradation events, or trying to improve bottom line.

There's really no good way to detect this on a few-prompt level without overspending significantly, because they're all black boxes.

alt Hacker News