I guess it mostly comes from using the model with batch-size = 1 locally, vs high batch size in a DC...

littlestymaar • today at 3:15 AM • 0 replies • view on HN

I guess it mostly comes from using the model with batch-size = 1 locally, vs high batch size in a DC, since GPU consumption don't grow that much with batch size.

Note that while a local chatbot user will mostly be using batch-size = 1, it's not going to be true if they are running an agentic framework, so the gap is going to narrow or even reverse.

alt Hacker News