logoalt Hacker News

rmunntoday at 3:22 PM10 repliesview on HN

This is the kind of thing that Anthropic et al should be worried about. As it becomes easier and easier to run local models, the ceiling of what they'll be able to charge will get lower and lower. Not that nobody will be willing to pay $$$$$ per month, but a lot of people are going to multiply the per-month charge by 12 or 24 and say "Could I set up a local model for less than that, and have it pay for itself within a year or two?" And if a significant portion of customers decide to buy instead of rent, the companies whose business model is entirely centered around renting will suddenly find themselves hurting for customers.


Replies

sathackrtoday at 3:35 PM

The opposite of that has been happening for 20 years now with cloud compute.

It won't happen with AI models either.

It's almost ingrained in the American business model now. Outsource everything. Nobody wants to manage a room full of servers when they can spend 2-3x as much and outsource that headache along with the responsibility for it.

Same will happen with AI. Whether that means paying Anthropic that premium or paying AWS.

I'm in a relatively small business, we recently had an outage related to our local infrastructure.

I got pressure from the CEO saying it wasn't reliable to host our own infrastructure anymore even though our total internal down time over the last 5 years is significantly less than even a single of the larger recent AWS outages.

Everyone wants to shuck the chore and the responsibility.

show 8 replies
pessimizertoday at 6:34 PM

> but a lot of people are going to multiply the per-month charge by 12 or 24 and say "Could I set up a local model for less than that, and have it pay for itself within a year or two?" And if a significant portion of customers decide to buy instead of rent, the companies whose business model is entirely centered around renting will suddenly find themselves hurting for customers.

And those are going to all be big enterprise companies that probably will set up LLM services entirely in-house, because they've got the headcount to utilize servers at 100%.

I wonder if there will be (or is currently) business in selling their compute while they're not working, to opposite time zones, etc.

What's left for the big providers will be the dregs of individual subscriptions and small businesses that at their least paranoid might let employees just use their own subscriptions for work.

indoordin0saurtoday at 3:33 PM

I'm curious when coding-heavy companies will start running their own on-prem AI clusters. Has anyone had the idea to sell something like 4 GPU machine an engineering team could throw in a closet somewhere and run whatever they want on it? I imagine this won't appeal to everybody but with the trust issues the hyperscalers have developed hoovering up people's data and using it to train their models, I imagine some will find value in a machine and model they have transparent control over including the option to walk over and unplug the thing.

show 1 reply
storustoday at 4:31 PM

They are working hard on you not being able to run a thing locally. OpenAI buys all RAM on the spot market, causing the rise of RAM/VRAM prices 6x, making GPUs and decent computers unreachable for the majority of the population. OK, some richer folks might be able to get a 512GB MacStudio or a single RTX Pro 6000 for 13k and be able to run some decent local models, but the vast majority will need to use API. And at some point Nvidia might say: "We don't sell that many 6000s, so let's just cancel them altogether as we can gain 4x profit on datacenter-only GPUs" and then they'll become unobtainium and no private person would ever be able to run anything decent (~1 year behind the frontier) locally.

frollogastontoday at 6:02 PM

Anthropic isn't just renting out compute, they're renting out a closed model that's better than anything you can download for free. So they're rightfully focused on preventing others from distilling their model.

bityardtoday at 4:31 PM

The general consensus is that local models will continue to improve drastically, but hosted models will as well. There will _always_ be a pretty big gulf of capability between what you can do with a desk full of hardware at home vs a few racks of hardware in a datacenter. That seems to be the real "moat" of hosted models at this point in time: access to capital.

What's interesting/exciting is that local models are _already_ quite good at tasks we never imagined AI _ever_ doing before ChatGPT hit the scene just a few short years ago.

We're also in an interesting point in time where companies are releasing the fruits of their research/labor (the LLMs) to the general public for free. For now, I think they see it in their best interest to gain mindshare and rapport, as well as advancing the state of the art in smaller LLMs ("a rising tide lifts all boats") but I fear and expect that these will dry up as the major players buy the minor players, and all will seek a return on their considerable investments in AI research.

show 2 replies
wuliwongtoday at 3:40 PM

These local models can do some of the work the non-frontier models can do but for me, that's not worth much. If I am just using Sonnet 4.6, I can pretty much work all day on the $20/month plan. And Sonnet is still a way more powerful model than a one you could self host on an M2 mac.

If things change to token usage billing for everyone, maybe I'll be singing a different tune but on a subscription, I don't think it makes sense financially.

Fun? Yes. Financially sound? No.

icodertoday at 3:36 PM

What I don't understand is that on one hand we read 'what they charge is much less than it costs them' and on the other hand this thread seems to suggest that 'what they charge is more than it would cost me'.

show 4 replies
themaninthedarktoday at 3:24 PM

Maybe that is why they are buying up as much hardware as they can? If their service is the only game in town.

show 1 reply
sbmthakurtoday at 4:17 PM

Someone was able to run gemma-4-26B-A4B on an i5-8500 with 32 gb ram with NO GPU. Granted this is an extreme example these MoE models are value for money for a lot of use cases.

https://www.reddit.com/r/LocalLLaMA/s/YontVNVRbL