I'm constantly tempted by the idealism of this experience, but when you factor in the performance of the models you have access to, and the cost of running them on-demand in a cloud, it's really just a fun hobby instead of a viable strategy to benefit your life.
As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.
Coupled with the dramatically inferior performance of the weights you would be running in a local environment, it's just not worth it.
I expect this will change in the future, and am excited to invest in a local inference stack when the weights become available. Until then, you're idling a relatively expensive, rapidly depreciating asset.
> As the hardware continues to iterate at a rapid pace, anything you pick up second-hand will still deprecate at that pace, making any real investment in hardware unjustifiable.
Can you explain your rationale? It seems that the worst case scenario is that your setup might not be the most performant ever, but it will still work and run models just as it always did.
This sounds like a classical and very basic opex vs capex tradeoff analysis, and these are renowned for showing that on financial terms cloud providers are a preferable option only in a very specific corner case: short-term investment to jump-start infrastructure when you do not know your scaling needs. This is not the case for LLMs.
OP seems to have invested around $600. This is around 3 months worth of an equivalent EC2 instance. Knowing this, can you support your rationale with numbers?
> I expect this will change in the future
I'm really hoping for that too. As I've started to adopt Claude Code more and more into my workflow, I don't want to depend on a company for day-to-day coding tasks. I don't want to have to worry about rate limits or API spend, or having to put up $100-$200/mo for this. I don't want everything I do to be potentially monitored or mined by the AI company I use.
To me, this is very similar to why all of the smart-home stuff I've purchased all must have local control, and why I run my own smart-home software, and self-host the bits that let me access it from outside my home. I don't want any of this or that tied to some company that could disappear tomorrow, jack up their pricing, or sell my data to third parties. Or even use my data for their own purposes.
But yeah, I can't see myself trying to set any LLMs up for my own use right now, either on hardware I own, or in a VPS I manage myself. The cost is very high (I'm only paying Anthropic $20/mo right now, and I'm very happy with what I get for that price), and it's just too fiddly and requires too much knowledge to set up and maintain, knowledge that I'm not all that interested in acquiring right now. Some people enjoy doing that, but that's not me. And the current open models and tooling around them just don't seem to be in the same class as what you can get from Anthropic et al.
But yes, I hope and expect this will change!
I expect it will never change. In two years if there is a local option as good as GPT-5 there will be a much better cloud option and you'll have the same tradeoffs to make.
Hardware is slower to design and manufacture than we expect as software people.
What I think we’ll see is: people will realize some things that suck in the current first-generation of laptop NPUs. The next generation of that hardware will get better as a result. The software should generally get better and lighter. We’re currently at step -.5 here, because ~nobody has bought these laptops yet! This will happen in a couple years.
Meanwhile, eventually the cloud LLM hosts will run out of investors money to subsidize our use of their computers. They’ll have to actually start charging enough to make a profit. On top of what local LLM folks have to pay, the cloud folks will have to pay:
* Their investors
* Their security folks
* The disposal costs for all those obsolete NVIDIA cards
Plus the remote LLM companies will have the fundamental disadvantage that your helpful buddy that you use as a psychologist in a pinch is also reporting all your darkest fears to Microsoft or whoever. Or your dev tools might be recycling all the work you thought you were doing for your job, back into their training set. And might be turned off. It just seems wildly unappealing.
>but when you factor in the performance of the models you have access to, and the cost of running them on-demand in a cloud, it's really just a fun hobby instead of a viable strategy to benefit your life.
Its because people are thinking too linearly about this, equating model size with usability.
Without going into too much detail because this may be a viable business plan for me, but I have had very good success with Gemma QAT model that runs quite well on a 3090 wrapped up in a very custom agent format that goes beyond simple prompt->response use. It can do things that even the full size large language models fail to do.
> anything you pick up second-hand will still deprecate at that pace
Not really? The people who do local inference most (from what I've seen) are owners of Apple Silicon and Nvidia hardware. Apple Silicon has ~7 years of decent enough LLM support under it's belt, and Nvidia is only now starting to depreciate 11-year-old GPU hardware in drivers.
If you bought a decently powerful inference machine 3 or 5 years ago, it's probably still plugging away with great tok/s. Maybe even faster inference because of MoE architectures or improvements in the backend.
AFAICT, the RTX 4090 I bought in 2023 has actually appreciated rather than depreciated.
really depends on whether local model satisfies your own usage right? if it works locally well enough, just package it up and be content? as long as it's providing value now at least it's local...
once the models behind API start monetization of their results, their outputs will get much worse. Its just a matter of time.
Everything you're saying is FUD. There's immense value in being able to do local or remote as you please and part of it is knowledge.
Also, at the end of the day is about value creates and AI may allow some people to generate more stuff but overall value still tends to align with who is better at the craft pre AI. Not who pays more.
Anything you build in the LLM cloud will be. Must be. Rug pulled either via locking success or utter bankruptcy or just a model context prompt change.
Unless you're a billionaire with pull, you're building tools you cant control, cant own and are ephermap wisps.
That's even if you can even trust these large models in consistency.
It's not that bad. If you're an adult making a living wage, and you're literate in some IT principles and AGI operations know-how, it's not a major onetime investment. And you can always learn. I'm sure your argument deterred a lot of your parents' generation from buying computers, too. Where would most of us be if not for that? This is a second transistor moment, right in our lifetime.
Life is about balance. If you Boglehead everything and then die before retirement, did you really live?
Running LLMs at home is a repeat of the mess we make with "run a K8s cluster at home" thinking
You're not OpenAI or Google. Just use pytorch, opencv, etc to build the small models you need.
You don't need Docker even! You can share over a simple code based HTTP router app and pre-shared certs with friends.
You're recreating the patterns required to manage a massive data center in 2-3 computers in your closet. That's insane.
This is especially true since AI is a large multiplicative factor to your productivity.
If Cloud LLMs have 10 IQ points > local LLM, within a month, you'll notice you'll be struggling behind the dude who just used Cloud LLM.
LocalLlama is for hobbies or your job depends on running locallama.
This is not one-time upfront setup cost vs payoff later tradeoff. It is a tradeoff you are making every query which compounds pretty quickly.
Edit : I expect nothing better than downvotes from this crowd. How HN has fallen on AI will be a case study for the ages
I think the local LLM scene is very fun and I enjoy following what people do.
However every time I run local models on my MacBook Pro with a ton of RAM, I’m reminded of the gap between local hosted models and the frontier models that I can get for $20/month or nominal price per token from different providers. The difference in speed and quality is massive.
The current local models are very impressive, but they’re still a big step behind the SaaS frontier models. I feel like the benchmark charts don’t capture this gap well, presumably because the models are trained to perform well on those benchmarks.
I already find the frontier models from OpenAI and Anthropic to be slow and frequently error prone, so dropping speed and quality even further isn’t attractive.
I agree that it’s fun as a hobby or for people who can’t or won’t take any privacy risks. For me, I’d rather wait and see what an M5 or M6 MacBook Pro with 128GB of RAM can do before I start trying to put together another dedicated purchase for LLMs.