AIUI, the main obstacle to maximizing performance with SSD offload is that existing GGUF files for M...

zozbot234 • today at 6:14 AM • 1 reply • view on HN

AIUI, the main obstacle to maximizing performance with SSD offload is that existing GGUF files for MoE models are not necessarily laid out so that fetching a single MoE layer-expert can be done by reading a single sequential extent off the file. It may be that the GGUF format is already flexible enough in its layout configuration that this is doable with a simple conversion; but if not, the GGUF specification would have to be extended to allow such a layout to be configured.

Replies

adrian_b • today at 10:26 AM

You are right, which is why I do not intend to use a GGUF file but a set of files with a different layout, and this is why I need to make changes in llama.cpp.

➕ show 1 reply

alt Hacker News

Replies