I appreciate the data here but I don't think the read is quite right; Saying we have linear c...

aspenmartin • yesterday at 9:14 PM • 2 replies • view on HN

I appreciate the data here but I don't think the read is quite right;

Saying we have linear capability for super-linear cost compares an unbounded variable (dollars) to bounded instruments (because benchmarks saturate). On unbounded measures, growth is exponential; you can see METR time horizons double every ~4-7 months (https://metr.org/blog/2026-1-29-time-horizon-1-1/). And capability being proportional to log(compute) is what the scaling law predicts.

Epoch puts training cost growth at ~2.4x/year as your link shows. Meanwhile cost for fixed capability falls ~10-40x/year (https://epoch.ai/data-insights/llm-inference-price-trends), and lab revenue is growing ~10x/year! Anthropic went from $1B to $9B to $30B+ run rate in ~15 months, OpenAI ~$25B.

On [3]: the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev. The RCT evidence is genuinely mixed (METR: -19%, with n = 20 and Claude 3.x; Cui et al: +26%) but its just super hard to do this well, I think Faros stuff was pretty cool, I haven't seen this before so thank you for the reference.

Replies

oudlys • yesterday at 9:49 PM

>"On unbounded measures, growth is exponential"

Maybe. There was a great comment in the thread on Fable 5 yesterday about benchmark comparisons between Fable and the latest opus models. here it is: https://news.ycombinator.com/item?id=48464600.

You could be right, but this is the most direct benchmark comparison I could find and it's not that strong.

>the "destroying value" conclusion flips sign on an assumed 15% baseline rework rate. The report's most direct metric is +16% merged PRs per dev.

I discuss this directly in my analysis. There's also an 860% code churn increase ratio. You only need 9% of that to be allocated to wasteful rework to drive throughput flat to the 15% rework baseline. Not to an assumed ideal state where there was no rework.

But even if it were not true, a 16% throughput improvement is pretty weak given the investment - especially given the direct evidence of quality degradation. IMO.

I appreciate you reading my stuff and taking the data seriously. Thank you.

➕ show 1 reply

balefulboy • yesterday at 9:25 PM

METR's time horizon is not a reliable metric of LLM capability growth: https://www.transformernews.ai/p/against-the-metr-graph-codi...

alt Hacker News

Replies