I take a peak every month or so at spend for my company and notice more and more are consumed $1k in tokens a month and it is bewildering to me how. I use llms daily, and see anywhere from $200-$400 tops. This is using the most expensive models, in deep thinking mode. So I'm not a Luddite against the usage of them. I just can't figure how _how_ to burn that much money a month responsibly.
I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value. At a corporate level, I'd much rather hire a junior engineer who spends $100-$200/month and becomes productive then try and rationalize $100k/year in token spend.
First: There's the obvious "If the company is letting me do it, I'll be wasteful." This includes not clearing/compacting the context often. Opus now has a 1M context window, and quality is good to at least 200K. So each query is burning a lot of tokens until you clear/compact.
People have already mentioned the size/complexity of the codebase. I'm new to my team and the codebase isn't huge, but it's large enough that there are plenty of parts I have little understanding about. When I'm given a task, then yes, I definitely go to Claude and ask it to find the relevant parts of code so I can understand the existing workflow before even attempting to change it.
The downside is that I don't build expertise. But the reality is that with Claude, I can get the work done in 1 day that would take me 5 days of struggling, and if everyone is doing it, I can't be left behind. So I take the middle route - I get it done in 2-3 days instead of 1 so I can at least spend some time with the code.
Especially with AI, the rate at which code changes in our codebase is insane. So I built a tool that takes a pull request, and tells the LLM to go deep and explain to me what that pull request does. (Note: I'm not the reviewer, I just want to keep tabs on the work that is going on in the team).
And this is just the beginning. I haven't actually spent time to come up with more ways to use the LLM to help me.
My usage is similar to yours, but if I were fairly experienced with the code base, I'd do a lot more. I haven't asked, but I suspect there are people in my team who go over $1K/month.
As always, the bottleneck is proper testing and reviews.
Edit: I'll also add that for not-so-important code used within the company, I suspect most people are going full-AI with it. For my personal (non-work) code, I just let the AI code it all - the risk is usually very low (and problems are caught quickly). If someone is using the "superpowers" skill, then even for basic features you can burn lots of tokens. I usually start with 20-40K tokens and end up with 80-90K tokens when it's finished. Which means that many of the requests prior to completion were sending in close to 80K tokens. Multiply that with the number of queries, etc.
Wasteful, but if someone else is paying ...
I have ancedotal examples of claude code choosing a solution to a problem that is ridiculously token inefficient.
One example - was giving several agents different sub problems to solve in a complex ML / forecasting problem. Each agent would write + run + read a jupyter notebook. This worked ok, the notebooks would be verbose but it was fine... until one of them wrote out hundreds of thousands of rows to a cell output, creating a 500MB ipynb file. Claude tried several times to read it and it used my entire context limit.
The solution was to prescribe a better structure of doing the world (via CLI analysis scripts + folders to save research results to). But this required some planning, thought, and design work by me the operator.
When I see people spending $10k a month in tokens, I can only assume they are taking lazy hands off approaches to solving problems with the expensive hammer that is claude code. EX: have claude read all your emails every day... the lazy solution is to simply do that, but a smarter solution is to first filter the email body HTML to remove the noise.
Really depends on the repo you’re working in.
If it’s very large, especially if the tool needs to refer to documentation for a lot of custom frameworks and APIs, you often end up needing very large context windows that burn through tokens faster.
If it’s smaller or sticks with common frameworks that the model was trained on, it’s able to do a lot more with smaller context windows and token usage is way lower.
If Uber is like most other companies, there's a leaderboard for AI tokens consumed. If maximizing your token usage is going to get you to the top of the leaderboard, and therefore promoted for "productivity", people are going to find creative ways to be "productive".
> I just can't figure how _how_ to burn that much money a month responsibly.
Same but in regards to quotas. I'm on the 200 EUR ChatGPT plan, so presumable have the highest quota, using the "most expensive" models, on highest reasoning, in fast-mode (1.5x quota usage) and after a full day of almost exclusively doing programming with agents, I still get nowhere close to hitting my quota.
In fact, since I started using agents for coding, the only time I even got close, was when I was doing cross-platform development with the same as above, but on three computers at the same time, then I almost hit my weekly quota. But normally, I get down to ~20% of the quota but almost never below that. I don't see how I could either, I'm already doing lots of prompts and queries "for fun" basically.
One thing that stands out it is it sounds like you're using LLMs for only one part of your process. You're having LLMs help you write code, but the code you're writing doesn't itself make use of LLMs.
My current job basically involves trying to improve processes that themselves make heavy use of LLMs. Once you have multiple agents in parallel running multiple experiments on improving the performance of primarily LLM driven tools it's not that hard to get your token usage pretty high.
I'm on the same page. Do people not analyze the problems themselves? Are they just copy/pasting their entire ticket description into Claude Code and having it iterate until they land on something that works?
I don't get it.
Several options on how to burn that amount of money without being specifically looking to tokenmaxx
- Agents that spawn other agents
- Telling agents to go look at the entire codebase or at a lot of documents constantly
- MCP/API use with a lot of noise
- Loops where the agent is running unattended.
I do think it's not really responsible use and a loop where the agent is trying to fix CI for one hour for something that would take you five minutes (for example) is absurd. But people do that.
I've been working on a project to build a new Postgres based database in Rust[0]. I'm four weeks in and have 93% of the Postgres test suite passing. I've found agents to have worked really well for this as I have an existing codebase that has good architecture that I can point my agents at. It's also easy to debug as I can diff what my agents are doing and what Postgres is doing.
I've had to get multiple codex accounts, but there was a brief period of time where I tried API usage to see how expensive it would be. In about an hour I spent $650 of credits. I had codex estimate how much I would be spending if I was doing pure API usage and it estimated around $10k/week.
For context Postgres is 1M lines of C code. It's looking like pgrust will come out as less lines of code than Postgres and at peak I was adding over 100k lines of code in a day. I would estimate it would take a team of 5 software engineers at least 3 years to get to where I got in a month with a couple Codex subscriptions.
Claude is a mediocre programmer that can do great things with great supervision, but it can't make mediocre human programmers into good ones, because they can't provide great supervision.
It will try and try and try, though.
I spend 400-500 dollars per day during active development at this point. However with more aggressive task breakdowns I can spend ~5k per day.
These spend rates are in part due to operating on a larger code base. Operating on a larger code base means more time searching and understanding the code, tests, test output. They are also due to going all-in on agentic coding.
It can feel painfully slow to go back to coding by hand when for a dollar you can build the same functionality in a minute. Now do this with multiple sessions and you can see where the cost goes.
> responsibly
There’s your problem. You’re trying to be responsible instead of trying to burn tokens so you can have your name on top of some leaderboard for most wasteful AI users.
I dont use automated agent workflows or anything, I just use clause as a pair programmer of sorts. A month or so ago I used claude Opus 4.6 for 2-4 hours on API pricing and racked up $20 in spend, which surprised me since that was much higher than my usual.
I dont know about $10,000, but i can see hitting $1,000 pretty easily if you aren't looking at the costs.
It turns out writing good prompts helps to keep token usage down as the model wastes fewer tokens discovering context it needs that wasn't hinted at in the prompt.
Whereas a good prompt will give solid leads to all the specifics needed to complete the task.
>I genuinely challenge someone spending $5-$10k a month to demonstrate how that turns into $50-$100k in value
At a lot of businesses $5-10k/mo of AI spend doesnt even translate into $5-10k/mo value. Churning out code was rarely the business value bottleneck. It was convenient for everybody else to blame developers not writing code fast enough for their failures. Now they have no excuse but I doubt will own up.
Multimedia feedback can burn much more than that. If I'm sending frames of 3D engine's output. I mean I would like to send it a video if I could but that is too expensive but I'm sure there's orgs out there that really do want every frame in a prompt doing something. This can be exponential depending on the application. I recently wrote a Milkdrop visualization analyzer. I could have sent thousands of frames for each one. I didn't but well I wish I could haha.
Try the Jira MCP server.
Yeah, I use Claude Code to do security reviews. For every CVE that Wiz flags, I have Claude Code check for reachability analysis.
I typically consume about $200/month doing this. Most of our engineers are in the $200-400 range, with a few people around $1,000.
But then there's one guy who's not only hitting $8,000, but supposedly has nearly 300,000 lines of code accepted (Note: This means he's accepted the lines of code from Claude, not that he's committed it). I can't figure out how.
Do lots of deep research and code reviews on large legacy codebases. I've created lots of documentation to reduce token consumption but it's still a lot of token consumption.
The answer may be agentic loops that keeps cycling through the same problem again and again until they land on a non-erroneous outcome. Some people boast having multiple such agents working in parallel on different problems, tending to one while another is processing, perhaps not unlike the movie mad scientist who runs around the lab throwing switches while laughing maniacally at the prospect of his impending success.
> in deep thinking mode
You mean deep brute-force mode of search results parsing themselves…
Don't underestimate corporate waste. If it's not someone's job to care for something, they really won't.
Even before this AI wave, it was common for me to see spinning dev environments for like $3k/month that hadn't been used in months on AWS.
There was a tool posted called codeburn that showed a breakdown of what activity your usage was spent on. Mine was almost all coding but other people in the thread said >50% of their usage was conversation. I’m inclined to agree with you that someone who is reasonable with their compute usage is likely to be thinking things through rather than just burning tokens to get an LLM to solve the problem
> I just can't figure how _how_ to burn that much money a month responsibly.
I always have a few agents (2-5) doing research and working on plans in parallel. A plan is a thorough and unambiguous document describing the process to implement some feature. It contains goals, non-goals, data models, access patterns, explicit semantics, migrations, phasing, requirements, acceptance criteria, phased and final. Plans often require speculative work to formulate. Plans take hours to days to a couple of weeks to write. Humans may review the plans or derived RFCs. Chiefly AI reviews the code (multiple agents with differing prompts until a fixed point is reached between them). Tests and formal methods are meant to do heavy lifting.
In my highest volume weeks, I ship low hundreds of thousands of lines of software not counting changes to deps.
> At a corporate level, I'd much rather hire a junior engineer
Any formulation of problem sufficient for a truly junior engineer to execute is better given to an agent. The solution is cheaper, faster, and likely better. If the later doesn't hold, 10 independent solutions are still cheaper and faster than a junior engineer.
There is no longer any likely path to teaching a junior engineer the trade.
I use it as an ide. I am a security engineer but there a bunch of predictable things I need to write code for. Onboarding logs, writing detection rules, SOAR type stuff. It makes a diff and locally tests all the permutations I describe than I review the code.
1. Worktrees
2. Multiple simultaneous projects
3. Orchestration that includes handling of CI workflow
4. Active work to further improve or refine tooling
5. Experimentation producing muscle memory as experience versus code output
In addition to what folks are saying here about larger code bases and multiple features at once, there’s also the time requirement to be efficient. It takes time to be more efficient with token usage and it may not be worth it for some of these companies so… burn away until we start to get more data and then we’ll check in.
I think companies are charged API prices vs individual prices. That alone is 10x for Anthropic. Not sure though.
On the OpenAI side, GPT-5.5 generates spend at a prolific rate that's even faster if you use it through an ACP connection in a tool like Zed. I used to never think about Codex rate limits and now I'm hitting mine every 5 hour block and spending ~$100/day on top of that in adhoc credit purchases.
I also don't think a lot of people know some of the more advanced context management tricks like /rewind /fork /tree to take advantage of prefix caching
I don't think it's about value. Tokenmaxxing is a thing now since that one CEO said he wants his $250k/yr devs to use $400-$500k/yr in tokens, so now it's all about how many agents can you have running concurrent tasks all day long.
In our org it's people that have too much stuff in their context, every mcp in the world installed, GTD, PAI, OpenClaw. I'm equally baffled how one can spend that much money during their day to day.
It turns into 50k to 100k or more of value for the employee the moment upper management made AI spend a personal performance target across most corporations.
Your estimates do not account for speed of delivery. If an AI can deliver 10x faster, the target is less than 10x a dev salary.
But 10x faster also gets you to market sooner. Which has value.
$400 * 23 business days would be $9k. Sounds ballpark to me
At least your workplace doesn't frame raw usage as a leaderboard, with awards given out for topping it
Keep word doing A LOT of lifting “responsibly”
You're probably generating new code rather than analyzing old code for "improvement".
> I use llms daily
this is your “problem” - you are missing the “nightly” part. on my box LLMs run 24/7 :)
Many companies actively hide the cost from their employees.
Idk about Uber in particular, but aside from legit programmers using AI to help them do legit work faster, there are people spamming it for metrics. And the hiring pipeline has gotten screwed up somehow, like half the people who reached the onsite interview for a technical role lied about all their technical skills, or they didn't lie and manage to pass hiring but then only take the tasks that AI can solo. And if it can't, waste tokens until giving up.
Do you run 20 claud code agent on max for 8 hours a day? :)
a good way to prevent companies from adopting AI (and keeping your job) is to waste tokens making AI cost prohibitive
Advanced agentic prompting.
It really depends on the way you use AI. If you just prompt it for a task and either accept or reject the output, you won't spend much.
But if you are like me, you aggressively document and brainstorm before planning, you review that documentation with subagents, make modifications, you aggressively plan, you verify that plan with subagents,make modifications, have a large number of phases, planning again for each phase, writing tests to cover 100%, implement each phase, do intermediate and final code reviews with subagents, apply fixes, write final documentation and do all these in parallel, if you have multiple tabs in your terminal each running Claude Code for 10-12 hours a day, then $5000 per day is not much.
If you use Anthropic or Open AI subscription and you spend $1000 per month, you are not using AI much.
I spent $24,096.47 in "API" costs with my $200 Claude Code Max subscription in April.
I'm building my own saas. I spent 6 months writing the code by hand before using Claude, and that was fine, but its much faster to give the exact specs to Claude and have 3-4 sessions working in parallel with me. When you validate changes with exact test specs there's much less correction you need to do. I always hit my weekly limit and it's far cheaper for me to use this than to hire someone and spend time onboarding them.
> notice more and more are consumed $1k in tokens a month
I've said it before: if you allow people to see how much others spent, they will try to climb up the "leaderboard".
It takes just ONE little praise for using tokens or one perk gained, and the GAME IS ON among the developers!
> I just can't figure how _how_ to burn that much money a month responsibly.
Well, if your bonus depends on spending it, you'll find a way.
My observation is - pasting long documents is a great way to burn tokens. Turn based conversation, even a very deep and technical one, consumes less tokens than "read these logs and tell me where the problem is". Ironically, the log reading example is a perfect use for a local LLM.
> I just can't figure how _how_ to burn that much money a month responsibly.
From my experience, this happens essentially by three means:
- Level 0 (beginner users) long lived conversations: If you dont get in the habit of compressing, or otherwise manually forcing the model to summarize/checkpoint its work, you will often find people perpetually reusing the same conversation. This is especially true for _beginners_, which did not spend time curating their _base_ agent knowledge. They end up with a single meta conversation with huge context where they feel the agent is "educated", and feel like any new conversation with the agent is a loss of time because they have to re-educate it.
- Level 1 (intermediate users) heavy explicit use of subagents: Once you discover the prompt pattern of "spawn 5 subagents to analyze your solution, each analyzing a different angle, summarize their findings", it can become addictive. It's not a bad habit per se, but if you're not careful it can drastically overspend your credits.
Level 3 (expert users) extreme multitasking. Just genuinely having 10 worktrees perpetually in parallel and cycling between them in between agent responses. Again, not necessarily bad in itself, but can exponentially conse credits.