logoalt Hacker News

ankit219today at 1:04 AM2 repliesview on HN

The issue is that claude code is cheap because it uses API's unused capacity. These kind of circumventions hurt them both ways, one they dont know how to estimate api demand, and two, the nature of other harnesses is more bursty (eg: parallel calls) compared to claude code, so it screws over other legit users. Claude code very rarely makes parallel calls for context commands etc. but these ones do.

re the whole unused capacity is the nature of inference on GPUs. In any cluster, you can batch inputs (ie takes same time for say 1 query or 100 as they can be parallelized) and now continuous batching[1] exists. With API and bursty nature of requests, clusters would be at 40%-50% of peak API capacity. Makes sense to divert them to subscriptions. Reduces api costs in future, and gives anthropic a way to monetize unused capacity. But if everyone does it, then there is no unused capacity to manage and everyone loses.

[1]: https://huggingface.co/blog/continuous_batching


Replies

blitzartoday at 1:33 AM

Your suggested functionality is server side, not client side.

> it uses API's unused capacity

I see no waiting or scheduling on my usage - it runs, what appears to be, full speed till I hit my 4 hour / 7 day limit and then it stops.

Claude code is cheap (via a subscription) because it is burning piles of investor cash, while making a bit back on API / pay per token users.

show 1 reply
ehsanu1today at 1:09 AM

They have rate limits for this purpose. Many folks run claude code instances in parallel, which has roughly the same characteristics.

show 1 reply