"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous genera...

fulafel • today at 1:19 PM • 6 replies • view on HN

"TPU 8t and TPU 8i deliver up to two times better performance-per-watt over the previous generation" sounds impressive especially as the previous generation is so recent (2025).

Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

Replies

mrlongroots • today at 2:43 PM

That training is compute-bound and inference is memory-bound is well-known, but I don't think Nvidia deployments typically specialize for one vs the other.

One reason is that most clouds/neoclouds don't own workloads, and want fungibility. Given that you're spending a lot on H200s and what not it's good to also spend on the networking to make sure you can sell them to all kinds of customers. The Grok LPU in Vera Rubin is an inference-specific accelerator, and Cerebras is also inference-optimized so specialization is starting to happen.

electroly • today at 2:32 PM

I can't answer for NVIDIA but AWS has its own training and inference chips, and word on the street is the inference chips are too weak, so some companies are running inference on the training chips.

➕ show 1 reply

burnte • today at 3:59 PM

> Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?

Dedicated hardware will usually be faster, which is why as certain things mature, they go from being complicated and expensive to being cheap and plentiful in $1 chips. This tells me Google has a much better grasp on their stack than people building on NVidia, because Google owns everything from the keyboard to the silicon. They've iterated so much they understand how to separate out different functions that compete with each other for resources.

zozbot234 • today at 1:40 PM

The "training" chips will probably be quite usable for slower, higher-throughput inference at scale. I expect that to be quite popular eventually for non-time-sensitive uses.

dataking • today at 1:25 PM

Vera Rubin will have Groq chips focused on fast inference so it points toward a trend. Also, with energy needs so high, why not reach for every feasible optimization?

xnx • today at 1:26 PM

Nvidia said in March that they're working on specialized inference hardware, but they don't have any right now. You can do inference from Nvidia's current hardware offerings, but it's not as efficient.

➕ show 1 reply

alt Hacker News

Replies