> 3090 would be nice
They would need 3x speedup over the current generation to approach 3090. A100 that has +- the 3090 compute but 80GB VRAM (so fits LLaMA 70B) does prefill at 550tok/s on a single GPU: https://www.reddit.com/r/LocalLLaMA/comments/1ivc6vv/llamacp...
the GB10 is only the same performance as a 3090. gb10 uses way less power.
i'm not sure why anyone would buy a mac studio instead of a gb10 machine for this use case.