That weird part is kind of what I was expecting. This goes to the thing that I posted on the threa...

Lerc • today at 11:05 AM • 0 replies • view on HN

That weird part is kind of what I was expecting.

This goes to the thing that I posted on the thread a couple of days ago. https://news.ycombinator.com/item?id=47327132

What you need is a mechanism to pick the right looping pattern, Then it really does seem to be Mixture of experts on a different level.

Break the model into input path, thinking, output path. and make the thinking phase a single looping layer of many experts. Then the router gets to decide 13,13,14,14,15,15,16.

Training the router left as an exercise to the reader.

alt Hacker News