How do you run this kind of a model at home? On a CPU on a machine that has about 1TB of RAM?
You can do it slowly with ik_llama.cpp, lots of RAM, and one good GPU. Also regular llama.cpp, but the ik fork has some enhancements that make this sort of thing more tolerable.
Two 512GB Mac Studios connected with thunderbolt 5.
Wow, it's 690GB of downloaded data, so yeah, 1TB sounds about right. Not even my two Strix Halo machines paired can do this, damn.