This is pretty harsh on DeepSeek.
There are some significant innovations behind behind v2 and v3 like multi-headed latent attention, their many MoE improvements and multi-token prediction.
I don’t think it’s that harsh. And I don’t also deny that they’re a capable competitor and will surely mix in their own innovations.
But would they be where they are if they were not able to borrow heavily from what has come before?
I don’t think it’s that harsh. And I don’t also deny that they’re a capable competitor and will surely mix in their own innovations.
But would they be where they are if they were not able to borrow heavily from what has come before?