Probably this was already done at Google, Meta, X and OpenAI, before training their LLMs.
There's actually section in the Wikipedia page that explicitly says DeepSeek was trained on it
There's actually section in the Wikipedia page that explicitly says DeepSeek was trained on it