The memory bandwith on M4 Max is 546 GB/s, M5 Max is 614GB/s, so not a huge jump. The ne...

fotcorn • today at 3:00 PM • 1 reply • view on HN

The memory bandwith on M4 Max is 546 GB/s, M5 Max is 614GB/s, so not a huge jump.

The new tensor cores, sorry, "Neural Accelerator" only really help with prompt preprocessing aka prefill, and not with token generation. Token generation is memory bound.

Hopefully the Ultra version (if it exists) has a bigger jump in memory bandwidth and maximum RAM.

Replies

anentropic • today at 3:42 PM

Do any frameworks manage to use the neural engine cores for that?

Most stuff ends up running Metal -> GPU I thought

➕ show 2 replies

alt Hacker News

Replies