They are heavily dogfooding. Coding is needed to orchestrate the training of the next Claude model, data processing, RL environments, evals, scaffolding, UI, APIs, automated experiments, cluster management, etc etc. This allows them to get the next model faster and then get the next one etc.
Making a model that's great at other kinds of knowledge/office work is coincidental, it doesn't feed back directly into improving the model.