logoalt Hacker News

Gareth321yesterday at 9:56 AM1 replyview on HN

I agree this is a new space and prediction volatility is much higher. We have evidence going back to at least 2019 that improvements have been exponential (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). The benchmarks are all over the place because improvements don't happen in a straight line. Even composites aren't that useful because the last 10% improvement can require more effort than the first 90%.

To be frank, from what I can see, even if all progress stopped right now, it would take 1-2 decades to fully operationalise the existing potential of LLMs. There would be massive economic and social change. But progress is not stopping, and in some measurements, continues to improve exponentially. I really think this is incredibly transformative. Moreso than anything humanity has ever experienced. In the last year, OpenAI and potentially Claude have been working on recursive self-improvement. Meaning these models are designing better versions of themselves. This means we have effectively entered the singularity.


Replies

aspenmartinyesterday at 3:51 PM

I agree with all of this -- the one nit I'll say is that scaling laws (e.g. Chinchilla -- classic paper on this that still holds) are based on next-token log loss on an evaluation set for pretraining, and follow (empirically) very consistent powerlaw relationships with compute / data (there is an ideal mixture of compute + data, and the thing you scale is the compute at this ideal mixture). So that's all I mean by performance -- we do also have as you observe benchmark performance trends (which are measured on the final model, after post-training, RL stages etc). These follow less predictable relationships, but it's the pretraining loss that dominates anyway.

I agree with all of this though