logoalt Hacker News

airtonixlast Sunday at 5:06 AM1 replyview on HN

it might be free, private, blazing fast (if you choose a model with appropriate parameters to match your GPU).

but you'll quickly notice that it's not even close to matching the quality of output, thought and reflecting that you'd get from running the same model but significantly high parameter count on a GPU capable of providing over 128gb of actual vram.

There isn't anything available locally that will let me load a 128gb model and provide anything above 150tps

The only thing that local ai model makes sense for right now seems to be Home Assistant in order to replace your google home/alexis.

happy to be proven wrong, but the effort to reward just isn't there for local ai.


Replies

PeterStuerlast Sunday at 11:53 AM

Because most of the people squeezing that highly quantized small model into their consumer gpu don't get how they have left no room for the activation weights, and are stuck with a measly small context.