logoalt Hacker News

7777777philtoday at 8:26 AM0 repliesview on HN

This lines up with something I keep coming back to. Sara Hooker's research shows compact models now outperform massive predecessors on many tasks, and scaling laws only reliably predict pre-training loss, not downstream performance. A minimal transformer learning 10-digit addition is a neat data point for that thesis. I wrote about the broader implications (2)

The trillion-dollar scaling bet looks increasingly like it's hitting diminishing returns.

(1) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5877662

(2) https://philippdubach.com/posts/the-most-expensive-assumptio...