The real improvement will be when the software engineers get into the training loop. Then we can hav...

snovv_crash • yesterday at 3:57 PM • 1 reply • view on HN

The real improvement will be when the software engineers get into the training loop. Then we can have MoE that use cache-friendly expert utilisation and maybe even learned prefetching for what the next experts will be.

Replies

zozbot234 • yesterday at 4:29 PM

> maybe even learned prefetching for what the next experts will be

Experts are predicted by layer and the individual layer reads are quite small, so this is not really feasible. There's just not enough information to guide a prefetch.

➕ show 2 replies

alt Hacker News

Replies