logoalt Hacker News

tacitusarclast Thursday at 5:28 AM1 replyview on HN

I think maybe you mean something else when you say MoE. I interpret that as “Mixture of Experts” which is a model type where there is a routing matrix applied per layer to sort of generate the matmul executed on that layer. The experts are the weight columns that are selected, but calling them experts kinda muddies the waters IMO, it’s really just a sparsification strategy. Using that MoE you almost certainly would get various different routing behaviors as you added to the context.

I might misunderstand you but it seems like you think there are multiple models with one dispatching to others? I’m not sure what that sort of multi-agent architecture is called, but I think those would be modeled as tool calls (and I do believe that the image related stuff is certainly specialized models).

In any case, I am saying that GPT5 (or whichever) is the one actually refusing the request. It is making that decision, and only updating its behavior after getting higher trust data confirming the user’s words in its context.


Replies