logoalt Hacker News

oceanplexianlast Thursday at 4:24 PM2 repliesview on HN

I run models with Claude Code (Using the Anthropic API feature of llama.cpp) on my own hardware and it works every bit as well as Claude worked literally 12 months ago.

If you don't believe me and don't want to mess around with used server hardware you can walk into an Apple Store today, pick up a Mac Studio and do it yourself.


Replies

Eggpantslast Thursday at 6:49 PM

I’ve been doing the same with GPT-OSS-120B and have been impressed.

Only gotcha is Claude code expects a 200k context window while that model max supports 130k or so. I have to do a /compress when it gets close. I’ll have to see if there is a way to set the max context window in CC.

Been pretty happy with the results so far as long as I keep the tasks small and self contained.

show 1 reply
icedchailast Thursday at 11:52 PM

Whats your preferred local model?