I wanted to believe but anyone who has spent any time trying to run models locally knows this is not going to be solved by two lines of python running on rocm as the example shows.
Curious to hear more. My experience is limited to llama.cpp on Apple silicon so far, but have been eyeing AMD ecosystem from afar.
I am running q 4xgpu rig at home (similar to a mining rig) doing everything from llms to content creation. I have learned a lot. Having an AI rig today is much like having an early PC in the 80s. You dont appeciate the possible uses until you have it in your hands.
All you need is a used GPU slapped onto any disused ddr4 mobo. New 5060s, the 16gb models, can do basically everything now.
Uhmm... I have a local Ollama setup on Linux+AMD, and it was only a bit more involved than this sample. And only because I wanted to run everything in a container.
If you mean that you can't just run the largest unquantized models, then it's indeed true.
Not entirely.
I am running OpenWeb UI + Ollama + 7B on a Proxmox LXC container, it consumes less than 2GB, the GPU only has 4GB, and 50% CPU, it is very usable, sometimes faster than online ones to start giving you the answer and 100% offline.
If I replace the GPU with a faster one, I have no need to use online ones.