Thats also called slowing down default experience so users have to pay more for the fast mode. I thi...

falloutx • yesterday at 10:09 PM • 2 replies • view on HN

Thats also called slowing down default experience so users have to pay more for the fast mode. I think its the first time we are seeing blatant speed ransoms in the LLMs.

Replies

Aurornis • yesterday at 10:21 PM

That's not how this works. LLM serving at scale processes multiple requests in parallel for efficiency. Reduce the parallelism and you can process individual requests faster, but the overall number of tokens processed is lower.

➕ show 1 reply

throw310822 • yesterday at 10:12 PM

Slowing down respect to what?

➕ show 1 reply

alt Hacker News

Replies