Are you kidding? A good ratio of ST folks run finetunes of Mistral Nemo (if it tells you anything). Anyway your core statement is simply wrong ("local chat sucks").
> If you intend to do LLM inference on your local machine, we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM, but actual requirements may vary depending on the model and backend you choose to use.
Also, please be respectful when discussing technical matters.
From their own GitHub:
> If you intend to do LLM inference on your local machine, we recommend a 3000-series NVIDIA graphics card with at least 6GB of VRAM, but actual requirements may vary depending on the model and backend you choose to use.
Also, please be respectful when discussing technical matters.
P.S. I didn't say "local chat sucks".