logoalt Hacker News

thelastparadise10/12/20242 repliesview on HN

I wonder how LLM performance is on the higher core counts?

With recent DDR generations and many core CPUs, perhaps CPUs will give GPUs a run for their money.


Replies

kolbe10/12/2024

The H100 has 16,000 cuda cores at 1.2ghz. My rough calculation is it can handle 230k concurrent calculations. Whereas a 192 core avx512 chip (assuming it calculates on 16 bit data) can handle 6k concurrent calculations at 4x the frequency. So, about a 10x difference just on compute, not to mention that memory is an even stronger advantage for GPUs.

show 1 reply
nullc10/13/2024

They're memory bandwidth limited, you can basically just estimate the performance from the time it takes to read the entire model from ram for each token.