logoalt Hacker News

supermatttoday at 8:12 PM0 repliesview on HN

What is the max token throughput when batching. Lots of agentic workflows (not just vibe coding) are running many inferences in parallel.

It seems like every time someone does an AI hardware “review” we end up with figures for just a single instance, which simply isn’t how the target demographic for a 40k cluster are going to be using it.

Jeff, I love reading your reviews, but can’t help but feel this was a wasted opportunity for some serious benchmarking of LLM performance.