locally on what hardware? something like the new dgx spark, ryzen halo, or mac studio will cost you ~ $4k plus whatever you pay for power. at the rate AI is currently progressing, i think you'd be optimistic to consider that as having a 2 year depreciation.
for $4k, you can get 20 months of claude max 200. i'd take claude over the hardware.
anthropic will have something to worry about when you can run a local model on your macbook that can code. but i think we're quite a ways off from that.
people who can't afford Claude max 200 are using qwen 3.6 27b for local coding assistance already
Just a hunch, but I think the most cost effective “local” deployment method right now is renting GPU clusters by the hour and running all the inference software on them yourself. This will be cheaper than capital expenditure on hardware that will depreciate and become last-gen, and cheaper than OpenRouter pay per token.