Usage-based pricing killing your vibe, here's how to roll your own local AI

34 points • by Bender • yesterday at 6:19 PM • 32 comments • view on HN

Comments

It's a seriously degraded experience from a developer's perspective. Okay you've got one local LLM installed finally after configuring everything perfectly, what happens when you want to run a second instance? Now you've blown past your vram and system ram limits, and you're stuck to just one.

Furthermore, the model they recommend doesn't quite reach ~gpt-5.4-mini level performance- that quality dip means you may as well just pay for something like Kimi K2.6 via openrouter if you want a something ~>= sonnet 4.6 in performance as a backup for when you run out of anthropic/openai usage.

➕ show 4 replies

AussieWog93 • yesterday at 9:56 PM

I've tried these small models and they're nowhere near as good as Claude or GPT-5.

The new ones running on a 16GB M1 are maybe GPT-4 level (with decent performance to be fair).

I wonder if it's possible to make some hyper-overturned model that, say, does nothing but program in Python get SOTA-ish performance in that narrow task.

roscas • yesterday at 9:46 PM

BTW, LMStudio and a few others are really amazing. They allow you to download models from HF and manage many details before load them. A medium pc with an 8 or 10gb graphics card is already a nice setup to run many models, that are really good. You can also run Ollama that is very simple to use and help you code on vscodium with Continue. Pretty nice!

➕ show 1 reply

janice1999 • yesterday at 6:59 PM

A 24GB Nvidia RTX 3090 TI is ~2000 euro.

➕ show 2 replies

roscas • yesterday at 9:37 PM

Local AI does not mean privacy or offline. Claude code does not run offline. It needs an internet connection.

"./claude-2.1.126-linux-x64

Welcome to Claude Code v2.1.126

Unable to connect to Anthropic services

Failed to connect to api.anthropic.com: ECONNREFUSED

Please check your internet connection and network settings.

Note: Claude Code might not be available in your country. Check supported countries at https://anthropic.com/supported-countries"

Let me also add that most of services that are private, will connect to the internet. LMStudio and many others will try to get a connection and all others. I don't remember a single one that does not connect to their servers and send some kind of information.

efficax • yesterday at 7:36 PM

qwen3.6 does a good job locally except it can take 20-30 minutes to respond to a prompt on a mac studio with 32gb of ram.

➕ show 2 replies

alt Hacker News

Usage-based pricing killing your vibe, here's how to roll your own local AI

Comments