There is a hole in the boat's bottom due to Chinese models. They might not be as good but they are not bad either or at least I had hard time finding any issues with Deepseekv4 Flash and Pro variants. They get their job done sometimes rarely giving up till they are done what they are after.
So even for enterprise deployments, as the dust settles down, CFO/CTOs might find out that deploying on an internal cluster of GPUs is far more cheaper and reliable for their organisational needs than paying someone else for burned tokens.
I’ve been using Kimi 2.6, GLM 5.1 , Minimax 2.7 and lately deepseek. I only spend 40$ a month and I don’t see the point in paying for Opus/Codex.
Chinese models are really quite good at a lot of stuff.
> CFO/CTOs might find out that deploying on an internal cluster of GPUs is far more cheaper and reliable
I think you're right especially if you're someplace that already has a data center, such as a university. Solves a lot of privacy concerns as well.
Qwen3.6:35b is good enough for a lot of stuff.
I just used ollama with a shell script to tackle my directory of papers/literature. I converted the first 6 pages of each document to PNG, handed them off to Qwen, and told it to spit out BibTeX, including the abstract. Two days later it was done, and I didn't spend anything on "tokens."
The Chinese models are only cheap on subsidized Chinese hosting. I have yet to find a USA-hosted Chinese model with a very clear value advantage over US models.
I am having some great experience with DeepSeek. In fact, it seems to perform better than Claude or Codex in my use case.
I don't see myself returning to Claude or Codex anytime soon.
[dead]
I had been saying this on HN repeatedly: people are going to use the smartest models for coding. They don't care how cheap your tokens are if they don't have the highest probability of solving your programming tasks.
And I was dead wrong. Now I mostly use DeepSeek Pro myself.