All of these systems use massive pools of GPUs, and allocate many requests to each node. The “slow it down” knob is to steer a request to nodes with more concurrent requests; “speed it up” is to route to less-loaded nodes.
Right, but that's still not Anthropic adding an intentional delay for the sole purpose of having you pay more to remove it.
Right, but that's still not Anthropic adding an intentional delay for the sole purpose of having you pay more to remove it.