logoalt Hacker News

byefruit01/20/20251 replyview on HN

This is pretty harsh on DeepSeek.

There are some significant innovations behind behind v2 and v3 like multi-headed latent attention, their many MoE improvements and multi-token prediction.


Replies

wrasee01/20/2025

I don’t think it’s that harsh. And I don’t also deny that they’re a capable competitor and will surely mix in their own innovations.

But would they be where they are if they were not able to borrow heavily from what has come before?

show 2 replies