Does anyone actually run a modest sized app and can share numbers on what one gpu gets you? Assuming...

ivape • last Wednesday at 12:54 PM • 0 replies • view on HN

Does anyone actually run a modest sized app and can share numbers on what one gpu gets you? Assuming something like vllm for concurrent requests, what kind of throughput are you seeing? Serving an LLM just feels like a nightmare.

alt Hacker News