logoalt Hacker News

menaerus06/24/20251 replyview on HN

Wouldn't batching the multiple inference requests from multiple different users with multiple different contexts simultaneously impact the inference results for each of those users?


Replies

pests06/24/2025

The different prompts being batched do not mathematically affect each other. When running inference you have massive weights that need to get loaded and unloaded just to serve the current prompt and however long its context is (maybe even just a few tokens even). This batching lets you manipulate and move the weights around less to serve the same amount of combined context.

show 2 replies