It’s not just old, it’s also tiny and quantized. It’s llama 3.1 8b at 3/6-bit quant. This is th...

Kirby64 • yesterday at 8:36 PM • 1 reply • view on HN

It’s not just old, it’s also tiny and quantized. It’s llama 3.1 8b at 3/6-bit quant. This is the type of thing you can run on almost any device…

windexh8er • yesterday at 9:57 PM

I get that, but not at 15k tokens/s.

➕ show 1 reply

alt Hacker News