You do not provide any comparison to llama.cpp with mmap. You do not explain how any kind of predi...

EnPissant • today at 5:28 PM • 0 replies • view on HN

You do not provide any comparison to llama.cpp with mmap.

You do not explain how any kind of predictor can work for MoE experts.

You do not explain how prediction can even be useful. I can predict the layers used in a dense model (all of them are used in order), but that doesn't help me much. It's still bottlenecked on bandwidth (hint: MoE doesn't change this).

alt Hacker News