I’ve often suspected these models of getting dumber when the service is under high load. But I’ve never seen actually measured results or proof. Anybody know of real published data here?
ChatGPT was brutal for it a couple years ago. You could tell when it would go into “lazy mode” during peak usage periods.
Suddenly instead of writing the code you asked for it would give some generic bullet points telling you to find a library to do what you asked for and read the documentation.
Not exactly what you're looking for but https://news.ycombinator.com/item?id=46810282
Here's a recent comment [1] by an OpenAI engineer confirming that they do in fact make such trade offs between intelligence and efficiency.
[1]: https://news.ycombinator.com/item?id=46909905