Wouldn't batching the multiple inference requests from multiple different users with multiple d...

menaerus • 06/24/2025 • 1 reply • view on HN

Wouldn't batching the multiple inference requests from multiple different users with multiple different contexts simultaneously impact the inference results for each of those users?

Replies

pests • 06/24/2025

The different prompts being batched do not mathematically affect each other. When running inference you have massive weights that need to get loaded and unloaded just to serve the current prompt and however long its context is (maybe even just a few tokens even). This batching lets you manipulate and move the weights around less to serve the same amount of combined context.

➕ show 2 replies

alt Hacker News

Replies