An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.
I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.
This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.