Run what exactly? | alt Hacker News

bethekidyouwant • today at 12:12 AM • 1 reply • view on HN

Run what exactly?

Replies

I'm assuming GP means 'run inference locally on GPU or RAM'. You can run really big LLMs on local infra, they just do a fraction of a token per second, so it might take all night to get a paragraph or two of text. Mix in things like thinking and tool calls, and it will take a long, long time to get anything useful out of it.

➕ show 1 reply