> I replicated David Ng's RYS method [...] found something I didn't expect.
> Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.
How did you not expect that if you read his post? That's literally what he discovered, two years ago.
For anyone interested, there's more meat in the post and comments from last week: https://news.ycombinator.com/item?id=47322887
That's explicitly not the unexpected part. Read the rest of the post.