logoalt Hacker News

zozbot234today at 6:14 AM1 replyview on HN

AIUI, the main obstacle to maximizing performance with SSD offload is that existing GGUF files for MoE models are not necessarily laid out so that fetching a single MoE layer-expert can be done by reading a single sequential extent off the file. It may be that the GGUF format is already flexible enough in its layout configuration that this is doable with a simple conversion; but if not, the GGUF specification would have to be extended to allow such a layout to be configured.


Replies

adrian_btoday at 10:26 AM

You are right, which is why I do not intend to use a GGUF file but a set of files with a different layout, and this is why I need to make changes in llama.cpp.

show 1 reply