This right now today is making the case for OSS AI and local inference. 200$/m to get rate limited makes a RTX 6000 Pro look cheap.
How well do local OSS models stack up to Claude?
What’s the depreciation on that RTX 6000 though?
New hardware keeps on coming with large gains in performance.
How well do local OSS models stack up to Claude?