What's the deal with Exo anyway? I've seen it described as an abandoned, unmaintained project.
Anyway, you don't really need a lot of fast RAM unless you insist on getting a real-time usable response. If you're fine with running a "good" model overnight or thereabouts, there are things you can do to get better use of fairly low-end hardware.
The founders of Exo ghosted the dev community and went closed-source. Nobody has heard from them. I wish people would stop recommending Exo (a tribute to their marketing) and check out GPUStack instead. Overall another rug pull by the devs as soon as they got traction.
There's a couple of alternatives to exo it seems https://github.com/b4rtaz/distributed-llama and https://github.com/ray-project/ray
It’s functional if your goal is to run models that won’t fit into RAM on a single machine. Functional.
the slow interconnects (yes, even at 40Gbps thunderbolt) severely limit both TtFT and tokens/second.
I tried it extensively for a few days, and ended up getting a single M3 Ultra Mac Studio, and am loving life.
You still need a lot of RAM though right? so its not going to be that cheap?
What sort of specs do you need?
Jeff Geerling just did a video with a cluster of 4 Framework Desktop main boards. He put a decent amount of work into Exo and concluded it’s a VC Rugpull… abandoned as soon as it won some attention.
He also explored several other open source AI scale out libraries, and reported that they’re generally way less mature than tooling for traditional scientific cluster computing.
https://www.jeffgeerling.com/blog/2025/i-clustered-four-fram...