Very nice effort. This has incredible technical depth, particularly in the DMA and QEMU sections. I also like that you didn't oversell it as the ideal Mac gaming solution. I found the AI inference results to be the most fascinating. Overall, it was a great read.
I have been bothering the VM team for years for VM GPU pass through. I worked on the Apple Silicon Mac Pro and it would have made way more sense if you could run a linux VM and pass through the GPU that goes inside the case!
Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!
Excellent article.
The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.
> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.
The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.
The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.
> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.
Or, more likely, it will tell you something it doesn't know.
Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.
This is pretty impressive. My impression was that eGPUs simply do not work with Apple Silicon.
(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)
This is proper mad science, love it
Nicely done! Glad to see real hacking is still alive in the age of AI.
This seems pretty useful for AI inference if it can pass Apple approval. I've wanted to use my Nvidia GPUs with a Mac Mini, this would enable it to run CUDA directly. Very cool!
I'm guessing the x86 emu is cause Windows games are rarely built for ARM, right? Was kinda curious how an ARM VM would fare. Anyway awesome article.
Once egpus work on Apple Silicon there will be little reason to own a pc
damn
lol, is there a list of games tho, which mac pro's can support
Wow, phenomenal project and write-up, thanks for sharing it.
"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."
This is why humans will always rule over crappy LLMs.
> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.
It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.
> Because OpenGL is not well-supported anymore on macOS, the game is completely unplayable there, even with CrossOver. Ironically, it plays totally fine on a Windows PC, but this is a game you literally can’t play on Mac without this eGPU setup.
I understand that this is true it seems that Doom does support Vulkan but you would need to add VK_NV_glsl_shader to MoltenVK. Probably much less work than what went into hanging an RTX 5090 off of a M4. Still, kudos to the scott and the local AI Inference speeds are pretty cool. What a crazy project! <applause>