I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big L...

all2 • today at 12:39 AM • 1 reply • view on HN

I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.

Replies

guerrilla • today at 4:08 AM

Yes, this is what I meant. People are running huge models at home now, I assumed people could do it on premises or in a data center if you're a business, presumably faster... but yeah it definitely depends on what time scales we're talking.

➕ show 2 replies

alt Hacker News

Replies