logoalt Hacker News

jychanglast Sunday at 10:27 AM2 repliesview on HN

You're most likely bottlenecked by memory bandwidth for a LLM.

The AMD AI MAX 395+ gives you 256GB/sec. The M4 gives you 120GB/s, and the M4 Pro gives you 273GB/s. The M4 Max: 410GB/s (14‑core CPU/32‑core GPU) or 546GB/s (16‑core CPU/40‑core GPU).


Replies

zargonlast Sunday at 5:43 PM

It’s both. If you’re using any real amount of context, you need compute too.

cdavidlast Sunday at 1:24 PM

Yeah, memory bandwidth is often the limitation for floating point operations.