No. There is good signal in IMO gold medal performance. These models actually learn distributed re...

vjerancrnjak • today at 5:23 PM • 0 replies • view on HN

No. There is good signal in IMO gold medal performance.

These models actually learn distributed representations of nontrivial search algorithms.

A whole field of theorem provingaftwr decades of refinements couldn’t even win a medal yet 8B param models are doing it very well.

Attention mechanism, a bruteforce quadratic approach, combined with gradient descent is actually discovering very efficient distributed representations of algorithms. I don’t think they can even be extracted and made into an imperative program.

alt Hacker News