logoalt Hacker News

pton_xdyesterday at 8:35 PM3 repliesview on HN

As long as I have a Claude subscription, why do they care what harness I use to access their very profitable token inference business?


Replies

ankit219yesterday at 9:25 PM

Because your subscription depends on the very API business.

Anthropic's cogs is rent of buying x amount of h100s. cost of a marginal query for them is almost zero until the batch fills up and they need a new cluster. So, API clusters are usually built for peak load with low utilization (filled batch) at any given time. Given AI's peak demand is extremely spiky they end up with low utilization numbers for API support.

Your subscription is supposed to use that free capacity. Hence, the token costs are not that high, hence you could buy that. But it needs careful management that you dont overload the system. There is a claude code telemetry which identifies the request as lower priority than API (and probably decide on queueing + caching too). If your harness makes 10 parallel calls everytime you query, and not manage context as well as claude code, its overwhelming the system, degrading the performance for others too. And if everyone just wants to use subscription and you have no api takers, the price of subscription is not sustainable anyway. In a way you are relying on others' generosity for the cheap usage you get.

Its reasonable for a company to unilaterally decide how they monetize their extra capacity, and its not unjustified to care. You are not purchasing the promise of X tokens with a subscription purchase for that you need api.

show 2 replies
arjieyesterday at 8:53 PM

Demand-aggregation allows the aggregator to extract the majority of the value. ChatGPT the app has the biggest presence, and therefore model improvements in Claude will only take you so far. Everyone is terrified of that. Cursor et al. have already demonstrated to model providers that it is possible to become commoditized. Therefore, almost all providers are seeking to push themselves closer to the user.

This kind of thing is pretty standard. Nobody wants to be a vendor on someone else's platform. Anthropic would likely not complain too much about you using z.ai in Claude Code. They would prefer that. They would prefer you use gpt-5.2-high in Claude Code. They would prefer you use llama-4-maverick in Claude Code.

Because regardless of how profitable inference is, if you're not the closest to the user, you're going to lose sooner or later.

gbear605yesterday at 9:36 PM

Because Claude Code is not a profitable business, it's a loss leader to get you to use the rest of their token inference business. If you were to pay for Claude Code by using the normal API, it would be at least 5x the cost, if not more.

show 1 reply