logoalt Hacker News

simgttoday at 9:59 AM1 replyview on HN

> I replicated David Ng's RYS method [...] found something I didn't expect.

> Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.

How did you not expect that if you read his post? That's literally what he discovered, two years ago.

For anyone interested, there's more meat in the post and comments from last week: https://news.ycombinator.com/item?id=47322887


Replies

regularfrytoday at 10:50 AM

That's explicitly not the unexpected part. Read the rest of the post.

show 1 reply