logoalt Hacker News

anonym29yesterday at 7:39 PM1 replyview on HN

similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv

GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44


Replies

hedgehogyesterday at 9:31 PM

If I did a proper benchmark I think the numbers would be what you got. Minimax M2.7 is also surprisingly not that slow, and in some ways faster as it seems to get things right with less thinking output. (around 140 t/s prefill and 23 t/s generation).

show 1 reply