> Reduce your expectations about speed and performance!
Wildly understating this part.
Even the best local models (ones you run on beefy 128GB+ RAM machines) get nowhere close to the sheer intelligence of Claude/Gemini/Codex. At worst these models will move you backwards and just increase the amount of work Claude has to do when your limits reset.
Exactly. The comparison benchmark in the local LLM community is often GPT _3.5_, and most home machines can’t achieve that level.
Maybe add to the Claude system prompt that it should work efficiently or else its unfinished work will be handed off to to a stupider junior LLM when its limits run out, and it will be forced to deal with the fallout the next day.
That might incentivize it to perform slightly better from the get go.
> intelligence
Whether it's a giant corporate model or something you run locally, there is no intelligence there. It's still just a lying engine. It will tell you the string of tokens most likely to come after your prompt based on training data that was stolen and used against the wishes of its original creators.
The best open models such as Kimi 2.5 are about as smart today as the big proprietary models were one year ago. That's not "nothing" and is plenty good enough for everyday work.