That's not how this works. LLM serving at scale processes multiple requests in parallel for efficiency. Reduce the parallelism and you can process individual requests faster, but the overall number of tokens processed is lower.
They can now easily decrease the speed for the normal mode, and then users will have to pay more for fast mode.
They can now easily decrease the speed for the normal mode, and then users will have to pay more for fast mode.