> maybe even learned prefetching for what the next experts will be Experts are predicted by lay...

zozbot234 • yesterday at 4:29 PM • 2 replies • view on HN

> maybe even learned prefetching for what the next experts will be

Experts are predicted by layer and the individual layer reads are quite small, so this is not really feasible. There's just not enough information to guide a prefetch.

Replies

yorwba • yesterday at 5:26 PM

It's feasible to put the expert routing logic in a previous layer. People have done it: https://arxiv.org/abs/2507.20984

snovv_crash • yesterday at 4:34 PM

Manually no. It would have to be learned, and making the expert selection predictable would need to be a training metric to minimize.

➕ show 1 reply

alt Hacker News

Replies