logoalt Hacker News

xrdyesterday at 10:12 PM4 repliesview on HN

I wanted to believe but anyone who has spent any time trying to run models locally knows this is not going to be solved by two lines of python running on rocm as the example shows.


Replies

h4kunamatayesterday at 11:39 PM

Not entirely.

I am running OpenWeb UI + Ollama + 7B on a Proxmox LXC container, it consumes less than 2GB, the GPU only has 4GB, and 50% CPU, it is very usable, sometimes faster than online ones to start giving you the answer and 100% offline.

If I replace the GPU with a faster one, I have no need to use online ones.

wilkystyleyesterday at 10:15 PM

Curious to hear more. My experience is limited to llama.cpp on Apple silicon so far, but have been eyeing AMD ecosystem from afar.

show 3 replies
sandworm101today at 1:01 AM

I am running q 4xgpu rig at home (similar to a mining rig) doing everything from llms to content creation. I have learned a lot. Having an AI rig today is much like having an early PC in the 80s. You dont appeciate the possible uses until you have it in your hands.

All you need is a used GPU slapped onto any disused ddr4 mobo. New 5060s, the 16gb models, can do basically everything now.

show 1 reply
cyberaxtoday at 12:31 AM

Uhmm... I have a local Ollama setup on Linux+AMD, and it was only a bit more involved than this sample. And only because I wanted to run everything in a container.

If you mean that you can't just run the largest unquantized models, then it's indeed true.