I think the fact that DeepSeek trains on competitor queries (i.e., distillation) — along with using banned Nvidia chips — helps explain how it can achieve such low training costs (USD 6 million vs. billions) while delivering only slightly worse performance than its American counterparts. It also undermines the narrative that DeepSeek or China is posing a serious challenge to the U.S. lead in AI. The gap may be closing, but the initial reactions now seem knee-jerk.
That the discussion has being hijacked and shifted to moral superiority is really unfortunate, because that was never the point in the first place.
These models never cost billions to train and I doubt the final training run for models like GPT-4 cost more than 8 figures. 6 million is definitely cheaper and I would attribute that to distillation.