logoalt Hacker News

vesseneslast Sunday at 5:21 PM0 repliesview on HN

Not a professional, but an avid researcher/reader.

These papers look promising, but a few initial strikes - first, the research itself was clearly done with agentic support; I'd guess from the blog post and the papers that actually the research was done by agents with human support. Lots of persistent give aways like overcommitting to weird titles like "Wind Tunnel" and all of the obvious turns of phrase in the medium post unfortunately carry on into the papers themselves. This doesn't mean they're wrong but I do think it means what they have is less info dense and less obviously correct, given today's state of the art with agentic research.

Upshot of the papers, there's one claim - each layer of a well trained transformer network allows a bayesian 'update' and selection of "truth" or preference of the model; deeper layers in the architecture = more accuracy. Thinking models = a chance to refresh the context and get back to the start of the layers to do further refinement.

There's a followup claim - that thinking about what the models are doing as solely updating weights for this bayesian process will get more efficient training.

Data in the paper - I didn't read deeply enough to decide if this whole "it's all Bayes all the way down" seems true to me. they show that if you ablate single layers then accuracy drops. But that is not news.

They do show significantly faster (per round) loss reduction using EM training vs SGD, but they acknowledge this converges to the same loss eventually (although their graphs do not show this convergence, btw), and crucially they do absolutely no reporting on compute required, or comparison with more modern methods.

Upshot - I think I'd skip this and kind of regret the time I spent reading the papers. Might be true, but a) so what, and b) we don't have anything falsifiable or genuinely useful out of the theory. Maybe if we could splice together different models in a new and cool way past merging layers, then I'd say we have something interesting out of this.