You can run at ~20 tokens/second on a 512GB Mac Studio M3 Ultra: https://youtu.be/ufXZI6aqOU8?si=YGowQ3cSzHDpgv4z&t=197
IIRC the 512GB mac studio is about $10k
and can be faster if you can get an MOE model of that
and can be faster if you can get an MOE model of that