logoalt Hacker News

bethekidyouwanttoday at 12:12 AM1 replyview on HN

Run what exactly?


Replies

all2today at 12:39 AM

I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.

show 1 reply