logoalt Hacker News

wrasee01/20/20257 repliesview on HN

Except it’s not really a fair comparison, since DeepSeek is able to take advantage of a lot of the research pioneered by those companies with infinite budgets who have been researching this stuff in some cases for decades now.

The key insight is that those building foundational models and original research are always first, and then models like DeepSeek always appear 6 to 12 months later. This latest move towards reasoning models is a perfect example.

Or perhaps DeepSeek is also doing all their own original research and it’s just coincidence they end up with something similar yet always a little bit behind.


Replies

matthewdgreen01/20/2025

This is what many folks said about OpenAI when they appeared on the scene building on foundational work done at Google. But the real point here is not to assign arbitrary credit, it’s to ask how those big companies are going to recoup their infinite budgets when all they’re buying is a 6-12 month head start.

show 1 reply
techload01/20/2025

You can learn more about DeepSeek and Liang Wenfeng here: https://www.chinatalk.media/p/deepseek-ceo-interview-with-ch...

show 3 replies
byefruit01/20/2025

This is pretty harsh on DeepSeek.

There are some significant innovations behind behind v2 and v3 like multi-headed latent attention, their many MoE improvements and multi-token prediction.

show 1 reply
wrasee01/20/2025

Also don’t forget that if you think some of the big names are playing fast and loose with copyright / personal data then DeepSeek is able to operate in a regulatory environment that has even less regard for such things, especially so for foreign copyright.

show 1 reply
gizmo01/20/2025

Fast following is still super hard. No AI startup in Europe can match DeepSeek for instance, and not for lack of trying.

show 2 replies
netdur01/20/2025

Didn't DeepSeek's CEO say that Llama is two generations behind, and that's why they didn't use their methods?