It's looking like we'll have Chinese OSS to thank for being able to host our own intelligence, free from the whims of proprietary megacorps.
I know it doesn't make financial sense to self-host given how cheap OSS inference APIs are now, but it's comforting not being beholden to anyone or requiring a persistent internet connection for on-premise intelligence.
Didn't expect to go back to macOS but they're basically the only feasible consumer option for running large models locally.
>...free from the whims of proprietary megacorps
In one sense yes, but the training data is not open, nor is the data selection criteria (inclusions/exclusions, censorship, safety, etc). So we are still subject to the whims of someone much more powerful that ourselves.
The good thing is that open weights models can be finetuned to correct any biases that we may find.
> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.
I presume here you are referring to running on the device in your lap.
How about a headless linux inference box in the closet / basement?
Return of the home network!
you have 128GB strix halo machines for US$ ~3k
these run some pretty decent models locally, currently I'd recommend GPT-OSS 120GB, Qwen Coder Next 80B (either Q8 or Q6 quants, depending on speed/quality trade-offs) and the very best model you can run right now which is Step 3.5 Flash (ubergarm GGUF quant) with 256K context although this does push it to the limit - GLMs and nemotrons also worth trying depending on your priorities
there's clearly a big quantum leap in the SotA models using more than 512GB VRAM, but i expect that in a year or two, the current SotA is achievable with consumer level hardware, if nothing else hardware should catch up with running Kimi 2.5 for cheaper than 2x 512GB mac studio ultras - perhaps medusa halo next year supports 512GB and DDR5 comes down again, and that would put a local whatever the best open model of that size is next year within reach of under-US$5K hardware
the odd thing is that there isn't much in this whole range between 128GB and 512GB VRAM requirement to justify the huge premium you pay for Macs in that range - but this can change at any point as every other day there are announcements
I don't really care about being able to self host these models, but getting to a point where the hosting is commoditised so I know I can switch providers on a whim matters a great deal.
Of course, it's nice if I can run it myself as a last resort too.
>I know it doesn't make financial sense to self-host given how cheap OSS inference APIs are now
You can calculate the exact cost of home inference, given you know your hardware and can measure electrical consumption and compare it to your bill.
I have no idea what cloud inference in aggregate actually costs, whether it’s profitable or a VC infused loss leader that will spike in price later.
That’s why I’m using cloud inference now to build out my local stack.
> Didn't expect to go back to macOS but their basically the only feasible consumer option for running large models locally.
Framework Desktop! Half the memory bandwidth of M4 Max, but much cheaper.
hopefully it will spread - many open options, from many entities, globally.
it is brilliant business strategy from China so i expect it to continue and be copied - good things.
reminds me of Google's investments into K8s.
AFAIK they haven't released this one as OSS yet. They might eventually but its pretty obvious to me that at one point all/most those more powerful chinese models probably will stop being OSS.
They haven't published the weights yet, don't celebrate too early.
> It's looking like we'll have Chinese OSS to thank for being able to host our own intelligence, free from the whims of proprietary megacorps.
I don’t know where you draw the line between proprietary megacorp and not, but Z.ai is planning to IPO soon as a multi billion dollar company. If you think they don’t want to be a multi billion dollar megacorp like all of the other LLM companies I think that’s a little short sighted. These models are open weight, but I wouldn’t count them as OSS.
Also Chinese companies aren’t the only companies releasing open weight models. ChatGPT has released open weight models, too.
our laptops, devices, phones, equipments, home stuff are all powered by Chinese companies.
It wouldn't surprise me if at some point in the future my local "Alexa" assistant will be fully powered by local Chinese OSS models with Chinese GPUs and RAM.
I'm not sure being beholden to the whims of the Chinese Communist Party is an iota better than the whims of proprietary megacorps, especially given this probably will become part of a megacorp anyway.
Not going to call $30/mo for a github copilot subscription "cheap". More like "extortionary".
Yeah that sounds great until it's running as an autonomous moltbot in a distributed network semi-offline with access to your entire digital life, and China sneaks in some hidden training so these agents turn into an army of sleeper agents.
> doesn't make financial sense to self-host
I guess that's debatable. I regularly run out of quota on my claude max subscription. When that happens, I can sort of kind of get by with my modest setup (2x RTX3090) and quantized Qwen3.
And this does not even account for privacy and availability. I'm in Canada, and as the US is slowly consumed by its spiral of self-destruction, I fully expect at some point a digital iron curtain will go up. I think it's prudent to have alternatives, especially with these paradigm-shattering tools.