$1500/mo is $18,000/seat/annum.
Maybe Microsoft and Nvidia are on to something.
128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
> How did it meaningfully impact their revenue in a positive direction?
It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.
> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.
Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.
$1500/month gets you about 150M tokens.
At the aforementioned energy/token, that's 3750kWh.
What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.
Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.
I agree on the basic point, but running $1500/mo's worth of SOTA local AI is non-trivial already, and that's a figure for a single seat. That's equivalent to generating at least 20 tok/s on a 24/7 basis, in fact probably quite a bit more than that (because open-weight models are vastly cheaper than proprietary ones even when served from reputable Western providers - reaching the same spend would take around 100 tok/s or more, which is well within datacenter hardware territory).
You could probably reach the former figure on a prosumer platform but only for very special workloads. If you spend a lot of time on prefill (which is common for agentic workloads) the outlook is even worse since that's a significant constraint for any on-prem AI.
I think companies will eventually just buy a local AI server.
Using local hardware is expensive when it's running a complicated software stack that can break in 10,000 different ways.
These eventual local AI servers will just talk some protocol for AI and sit in the corner and nobody will think about them.
I guess they still might need access to various systems, so idk. Eventually I think someone will offer "AI in a box" though, running the latest open model or whatever.
You’re way better to run your own on premise models. Laptops are depreciating assets, do not benefit from economy of scale, have fixed specs, result in a fragmented fleet where you need to keep models up to date. Without talking about power consumption and cooling issues. I really don’t see why companies would go that direction
128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.
> it's WTF did Uber build with all of that spend?
You can ask the same for the median 330k salary in the US for Uber Engineering... and being a bit snarky, attending Uber engineers talks here and there at a few conferences, looks like. they love to (re)invent internal tooling/platforms. That's pretty expensive on its own.
EDIT: I'm not saying that Uber's engineers didn't add value to the company, they absolutely did and handling the scale up they had to handle is not an easy feat. But I do challenge the notion of "what features did they create with that (LLM) spending?" of GP.
at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like
I don't think it's necessarily what Uber build, but the gained productivity. If the engineers use the AI tools the correct way, it can drastically increase the productivity and that means they can actually use the LLM as a junior or an associate engineer. $1500/mo is way cheaper for that level of productivity where as they would have had to pay far more for a human engineer.
Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.
Right - the future of LLMs is like ol' windows XP+Dell. Commercialized "things" you run locally offline, co-designed with hardware, with a known productivity suite, and large businesses building the next generation thing and suite with 18mo release cycles (ish).
Your last question is really important. What did they accomplish with all that spend?
I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?
>WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?
Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/
If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.
I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.
I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.
> WTF did Uber build with all of that spend?
WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?
I think probably the correct spend is something closer to 10x that if people can figure agent coordination problems out. It's not even really about capability at this point, it's about keeping track of what agents are doing.
You can't get an edge using local models, these guys may have competitors that will spend on SOTA models. They won't likely ever consider local machines even for some offloading scenarios, the complexity and costs will be even higher.
18k/yr? None of the LLMs generate anything like that in value!
How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.
I find anything below 50 tps or so entirely unusable...
Regardless its Apples to oranges anyway, inference is quite cheap for open weight models its just that Claude and OpenAI can charge very high margins compared to e.g. DeepSeek or various provider on OpenRouter since open models are a commodity.