logoalt Hacker News

stavrosyesterday at 10:12 PM1 replyview on HN

This makes no sense. It's not like they have a "slow it down" knob, they're probably parallelizing your request so you get a 2.5x speedup at 10x the price.


Replies

brooksttoday at 12:19 AM

All of these systems use massive pools of GPUs, and allocate many requests to each node. The “slow it down” knob is to steer a request to nodes with more concurrent requests; “speed it up” is to route to less-loaded nodes.

show 1 reply