Not sure why you got downvoted. 95% of people should be paying for a subscription. It's far cheaper, far more scalable, and far less hassle.
Local AI only makes sense for a couple of use cases:
- Privacy
- Constant churning on tokens
- Latency
- Availability
Local AI is "cheaper" when you already have the hardware sitting around, like an old MacBook or gaming GPU, or the API cost (subscriptions will all run out if you churn 24/7) is too high to bare. I'm surprised companies are still selling their old MacBooks to employees, when they could be turning them into Beowulf clusters for cheap AI compute on long-running jobs (the cost is just electricity)If usage-based pricing is killing your vibe, find a cheaper subscription with higher limits. Here's a list of them compared on price-per-request-limit: https://codeberg.org/mutablecc/calculate-ai-cost/src/branch/...
I recently set up a Gemma 4 heretic fine tune on my MacBook to prove that I could more than anything else and it is probably around 4o levels of performance imo. Not fit for any real work. That said the fact that 4o was frontier two years ago and today I can equal it on local hardware and uncensored is pretty impressive.
> 95% of people should be paying for a subscription.
Subscription plans are the "first hit is free" plans. Real pricing once subscriptions are phased out in a year or two is gonna be orders of magnitude more.
I think you're right about the cost/benefit trade-off in general, but I do wonder how much "compaction" Codex and Claude do is to keep context fresh and how much is to save (them) runtime costs.
If you've got a 1M token context, but they constantly summarize it down to something much smaller, is it really 1M tokens of benefit? With a local model, you can use all 256k tokens on your own terms. However, I don't have any benchmarks to know.