Many are aware, just can’t offload it onto their hardware. The 8B models are easier to run on an R...

reactordev • today at 2:07 AM • 0 replies • view on HN

Many are aware, just can’t offload it onto their hardware.

The 8B models are easier to run on an RTX to compare it to local inference. What llama does on an RTX 5080 at 40t/s, Furiosa should do at 40,000t/s or whatever… it’s an easy way to have a flat comparison across all the different hardware llama.cpp runs on.

alt Hacker News