logoalt Hacker News

reactordevtoday at 2:07 AM0 repliesview on HN

Many are aware, just can’t offload it onto their hardware.

The 8B models are easier to run on an RTX to compare it to local inference. What llama does on an RTX 5080 at 40t/s, Furiosa should do at 40,000t/s or whatever… it’s an easy way to have a flat comparison across all the different hardware llama.cpp runs on.