Just a hunch, but I think the most cost effective “local” deployment method right now is renting GPU...

chatmasta • yesterday at 11:57 PM • 0 replies • view on HN

Just a hunch, but I think the most cost effective “local” deployment method right now is renting GPU clusters by the hour and running all the inference software on them yourself. This will be cheaper than capital expenditure on hardware that will depreciate and become last-gen, and cheaper than OpenRouter pay per token.

alt Hacker News