Stockfish is a machine learning system, it seems quite plausible you might be getting slapped with the silent performance degradation (https://news.ycombinator.com/item?id=48467896).
Doesn't this "silent degredation" prevent any actual evaluation of the model? If the model fails at something, this allows anyone to claim that it failed due to degradation.
It’s possible this is happening at a technical level, but I have a hard time believing this is in the spirit of what Anthropic intends to throttle. It isn’t chip design or building out a competitor to Claude.
Stockfish does use neural nets but they are tiny, on the order of 10M params. Frontier LLMs are probably 100k or 1M times larger than that.
No, since it's a silent failure, it's not plausible. We have to assume all results we get are the actual model performance, because, it's the actual model performance as we understand it.
Someone trying to solve similar problems will have similar results if the "silent failure" applies consistently in aggregate. So, this is the model's performance.
Them silently nerfing the model without telling you, and still fully charging for it, is a new low and should probably be illegal.