Unless you use H100 or 4x 5090 you won't get a decent output.
The best bang for the buck now is subcribing to token plans from Z.ai (GLM 5.1), MiniMax (MiniMax M2.7) or ALibaba Cloud (Qwen 3.6 Plus)
Running quantized models won't give you results comparable to Opus or GPT.