logoalt Hacker News

hedgehogyesterday at 4:36 PM2 repliesview on HN

This one is around 250 t/s prefill and 12.4 generation in my testing.


Replies

muyuuyesterday at 8:14 PM

interesting, might be worth having around although it is still pretty slow

anonym29yesterday at 7:39 PM

similar numbers here - slightly higher PP. slightly better peak speed and retention w/ q8_0 kv cache quants too. llama-bench results here, cba to format for hn: https://pastebin.com/raw/zgJeqRbv

GTR 9 Pro, "performance" profile in BIOS, GTT instead of GART, Fedora 44

show 1 reply